1. Field of the Invention
This invention relates to optimization of the skew of a digital synchronous VLSI system and particularly to the acceleration of operation speed and the reduction of complexity.
2. Description of Related Art
Along with nano-technology that prevails, what is concerned in a high-performance integrated circuit design includes higher design complexity, high-speed clock frequency, lower interconnect width, and lower Supply Voltage, which increases the difficulty of high-performance integrated circuit design. Besides speed and surface area, reliability must be considered for the critical target of optimization.
A high-frequency digital synchronous VLSI circuit operates with a region assembly logic circuit, in this way, registers must be arranged in a specified region of the circuit and are controlled by clock signals; if the digital synchronous circuit is made to operate normally, the clock signals must be transmitted to each of the registers at different time of intervals, and thus the transmission of clock signals is implemented with the aspects of circuit and interconnect.
When a high-speed synchronous circuit gradually raise a clock frequency, the transmission of a clock signal in a Clock Distribution Network is a key point, which may raise clocks of a circuit and may be a feature required by the circuit for normal operation. How to effectively use clock variance for minimization of a clock cycle is a topic for discussion of clock variance scheduling and optimization. The time for the clock signal to reach a register must be corresponding to the limits of zero clocking and double clocking, and thus the circuit may work normally. The time for the clock signal to reach each register is not necessarily synchronous. Thus, effectively using the clock variance may not only make the circuit constantly operate normally but also raise the performance of circuit. Therefore, besides a conventional algorithm of clock variance scheduling and optimization, a conventional algorithm of polynomial time complexity is proposed. Due to the limit of clock cycle, results of multiple clock variation scheduling are given. In this invention, besides the optimization of clock variation scheduling, operation time is also taken seriously so that this invention is practicable.
In this invention, a quadratic equation cost function is mainly used to analyze an ideal value of skew and an error between feasible solutions. The depth first search algorithm, the minimum spanning tree, the concept of sparse matrix multiplication, and the Conjugate Gradients algorithm are used to speed up the operation and lower the complexity. Next, an ISCAS'89 is used as a testing circuit for outputting a simulation result. The methods proposed in this invention may make all the skews in the circuit reach to a target value. When circuit clocks may provide the circuit with correct running time, the reliability and performance of circuit increase.
Now, the present invention will be described more specifically with reference to the following embodiments. It is to be noted that the following descriptions of preferred embodiments of this invention are presented herein for purpose of illustration and description only; it is not intended to be exhaustive or to be limited to the precise form disclosed.
Generally speaking, a digital synchronous VLSI system proposed in a conventional technology comprises three elements:
The combinational logic between the two registers adjacent in proper sequence to each other is called a local data path.
The risk of clock is caused by double clocking and zero clocking. The so-called skew means a value of difference of a clock input Ri from that Ri between the two registers Ri and Ri adjacent to each other,
T
Skew(i,f)=Ci−Cf (1)
where the shift may cause the circuit to wrongly operate.
Double clocking indicates that a data signal is triggered two times before a clock signal from a register is not registered. Thus, a first triggered data signal is neglected because of departure from hold time.
As shown in
t
Am
Ff≧(Cf+kTCp+LF)+δHFf (2)
where tAmFf is the minimum time for the signal to reach the input terminal Df of register Rf, TCP is a period of clock signal, ΔLF is a tolerant error of the clock signal, and δHFf is the hold time of register Rf.
tAmFf is made up with two sets of time that may be divided, the output time of a former register Ri and the transfer time of combinational logic circuit:
t
Am
Ff
=t
Qm
Fi
+D
Pm
i,f (3)
where tQmFi=Ci+kTCP−ΔLF+DCQmF gives the time of output terminal Qi of the data register Ri, and DPmi,f is a minimum value of the transfer time of combinational logic circuit between the registers Ri and Rf. Thus, equation (2) may be changed into:
(C+kTCP−ΔLF+DCQmFf)+DPmi,f≧(Cf+kTCP+ΔLF)+δHFf (4)
After it is simplified, an equation is given below:
T
Skew(i,f)≧−(DPmi,f+DCQmFi)+δHFf+2ΔLF (5).
Equation (5) is given from double clocking, and thus it is deduced that the minimum limit must be applied to the skew between the registers Ri and Ri. From the above description, no clock period is given in equation (5), so equation (5) is not related to the clock period. Therefore, when the double clocking is caused in the circuit, raising the clock period is not helpful. It is apparent that the double clocking is more dangerous; namely, the clock risk in the circuit is almost caused by the departure from hold time.
Zero clocking indicates that a clock signal has been triggered before a data signal does not reach a register; thus, the register transmits an erroneous value mainly due to the departure from built-up time.
As shown in
t
AM
Ff≦(Cf+(k+1)TCP−ΔLF)−δSFf (6)
where tAMFf is the maximum time for the signal to reach the input terminal tAMFf of register tAMFf, tAMFf is a period of clock signal, tAMFf is a tolerant error of the clock signal, and tAMFf is the built-up time of register tAMFf.
tAMFf may be made up with two sets of time:
t
AM
Ff
=t
QM
fi
+D
PM
i,f (7)
where tQMFi=Ci+kTCP+ΔLF+DCQMFi gives the time of output terminal tQMFi=Ci+kTCP+ΔLF+DCQMFi of the data register tQMFi=Ci+kTCP+ΔLF+DCQMFi, and tQMFi=Ci+kTCP+ΔLF+DCQMFi is a maximum value of the transfer time of combinational logic circuit between the registers tQMFi=Ci+kTCP+ΔLF+DCQMFi and tQMFi=Ci+kTCP+ΔLF+DCQMFi. Thus, equation (6) may be changed into:
(tQMFi=Ci+kTCP+ΔLF+DCQMFi)+DPMi,f≦(Cf+(k+1)TCP−ΔLF)−δSFf (8)
After it is simplified, an equation is given below:
T
Skew(i,f)≦TCP−(DPMi,f+DCQMFi+δSFf)−2ΔLF (9)
Equation (9) is given from zero clocking, and thus it is deduced that the maximum limit must be applied to the skew between the registers Ri and Ri. From equation (9), two critical results may be known:
In the design of synchronous circuit, it is assumed that all the registers are triggered at the same time; namely, it is a so-called target to zero skew. In fact, the time for the clock signal to reach the register must be corresponding to the limits of zero clocking and double clocking, and thus the circuit may work normally. Thus, the skew must be in a permissible range and thus the local data path may work safely.
After the safe range of skew in the local data path is defined, the skew scheduling and optimization is implemented. As shown in
at step 1 of reading a test circuit 101, a circuit to be tested is first read in the program;
at step 2 of building up a circuit model 102, before optimization, pre-processing of and correlation analysis on the circuit, including establishment of the circuit model and linear-related analysis on the model must be carried out through step 102;
at step 3 of building up a minimum spanning tree 103, a minimum spanning tree is built up;
at step 4 of building up a matrix M=B×Bl 104, a matrix M=B×Bl is built up;
at step 5 of calculating a Lagrange multiplier 105, a Conjugate Gradients algorithm is used to calculate a Lagrange multiplier;
at step 6 of calculation for a feasible solution to skew, calculation is done for a feasible solution to skew;
at step 7 of checking the conformity with condition 107, the condition is checked for the conformity with condition Bs=0, l<s<u;
at step of fine tuning a value of <ideal skew> 108, if the condition does not meet the condition set at step 7, a value of <ideal skew> fine tuned, and step 5 returns;
at step of outputting a result 109, if the condition meets the condition set at step 7, a result is outputted.
A digital VLSI circuit comprises many combinational logic circuits and registers, and a simple digital synchronous circuit, as shown in
There is almost only a path (local data path) between registers adjacent to each other, but there might be more than two connections of registers adjacent to each other. At this time, another processing manner is divided into two conditions: a first condition when the direction of connections are the same, maximum and minimum values in the safe range of skew of each connection must intersect: ∩└lz(i), uz(i)┘; a second condition when the direction of connections are opposite, a direction must be set to a last direction, and when the direction of connections is different from that specified, the maximum and minimum values in the safe range of skew are exchanged and marked with a negative sign and finally intersect the safe range of skew in each period of connection: ∩└lz(i), uz(i)┘.
Generally, a large circuit comprises very many feedback feedforeward data paths. As shown in
C1=R1, s1, R3, s4, R2, s3, R1
C2=R2, s4, R3, s2, R4, R5, R2
and thus equation (10) is given below:
s
1
−s
3
+s
s=0
s
2
−s
4
+s
5=0 (10)
and next, equation (10) is expressed in the form of matrix:
In this invention, there are two methods of optimization of the skew, a least-square error method and an interior-point method, being described below.
It is assumed that a circuit model has been built up, comprising r register(s) and p local data path(s), in which p>r, so there is/are p skew value(s) S=[s1s2. . . sp]t to be calculated. The number of register(s) is r, so only r−1 local data path is required to connect to all the registers, in which the r−1 local data path is called main skew and the other one (p−r+1) is called chord skew. Thus, S may be changed into S=[S S], in which S=[s1 s2. . . sp−r+1]t is chord skew and sb=[Sp−r+2. . . Sp]t is main skew. Each skew si is corresponding to its safe range si and an intermediate value in the safe range stands for ideal skew. Thus, a set of ideal p clock shift value g=[g1 g2 . . . gp]t is likewise given that is equal to s=[sc sb], in which g may also be reduced to g=[gc gb]. For the purpose, it is expected that the practical clock shift s=[s1 s2. . . sp]t is close to the ideal clock shift value g=[g1 g2 . . . gp.]t as possible as it can. Thus, this issue is put into formula for the least-square error method:
where upper and lower limits are applied to each clock shift S and the clock shift S must meet the linear dependence proposed for the clock shift; thus, a limited condition of equation (12) is:
Bs=0
l≦s≦u
equation (12) is expressed in the form of matrix:
ε=(s−g)2=(s)2−2g's+(g)2=sls−2gts+gtg
where gtg is a constant and {tilde over (ε)}−2 gts; thus, equation (12) is changed into:
min {tilde over (ε)}=sts−2gts (13)
condition limit: Bs=0
where the limited condition l≦s≦u is temporarily neglected, and the upper and lower limits are not checked until the clock shift s is fully calculated.
Next, a method of quadratic equation of a conventional technology is used to calculate for a feasible solution to clock shift s=[sc sb]. At first, the Lagrange multiplier is introduced to equation (13) for a Lagrangean function (s, λ):
(s, λ)={tilde over (ε)}+λ′Bs=sts−2gts+λ′Bs (14)
{tilde over (ε)} is a fixed point in the Lagrangean function Δ(s,λ)=0, so 0 is given from a first-order partial differentiation, namely Δs(s,λ)=0 ∇λ(s, λ)=0s.
expressed in the form of matrix:
From equation (15), a second-row element B is eliminated in the Gaussian elimination method; namely, a first-row element is multiplied by
and then added to the second-row element for a result given below:
From equation (16), two formulae may be given:
and if s and g are expressed as s=[sc sb] and g=[gc gb], then equation (17) may be changed into:
The values of vectors B and g in equation (13) are known and M is also a matrix given after being multiplied by B, so the practical clock shift s=[sc sb] in the circuit may be given.
The interior-point method has been used to deal with the issue of optimization in recent years because a feasible solution to this problem in this method depending on quite a little times of iteration, and thus processing time is greatly improved. In this invention, a last experimental data is solved in the least-square error method.
Generally speaking, the interior-point method on the basis of a theory comprises three methods: Fiacco and McCormick logarithmical barrier function method that is used for solving the issue of optimization of an inequality condition, Lagrange's method that is used for solving the issue of optimization of an equality condition, and Newton's iteration method that is used for solving the issue of system of non-linear equations without any condition. The following original issue of interior-point method is considered:
A quadratic-expression objective function, a threshold condition, an inequality threshold condition are included.
The logarithmical barrier function method is used mainly because a barrier parameter μ starts from a certain positive value as an initial value and is gradually close to 0. Theoretically, when being reduced to 0, the barrier parameter tends towards a minimum value of the original issue. The interior-point method comprises the following steps, as shown in
At step 1 of building up an initial value 201, an initial value is built up, comprising an original variable and an antithetic variable.
At step 2 of calculating a barrier parameter 202, a barrier parameter is calculated, in which if a solution is convergence, the step ends, and if not, step 3 enters.
At step 3 of solving a matrix-vector equation 203, a matrix-vector equation is solved.
At step 4 of solving variates 204, the variates of an original variable and those of an antithetic variable are calculated.
At step 5 of calculating a step size 205, the step sizes of original and antithetic variables are calculated.
At step 6 of updating variables 206, all the original and antithetic variables are updated.
After the issue of clock shift is put into the formula of quadratic equation, with a known value, the practical clock shift s=[sc sb] is calculated. Next, programming is performed, in which 4 important steps as follows must be specially researched.
When a circuit is read in, the interconnection of circuit must be built up. Thus, a next clock shift calculation may be carried out, so the circuit must be modeled. The clock shift is the difference of clocks between the registers adjacent to each other, so the circuit model must be formed into the register-based model. In this invention, the Depth First Search Algorithm is used to construct the circuit model, and a recursion method is used as a skill, indicating that the program call itself until no node. After the Depth First Search Algorithm is used in this invention to construct the circuit model, except registers, logic gates in the circuit will be recorded in the form of value, and the generated model is the type of registers connected to each other.
From the description above, before the clock shift is transformed into the quadratic equation, all the loops in the circuit must be searched, in which the loops are independent. Among the methods of searching all the loops, the null space method is widely used but inefficient. Thus, in this invention, at step 3 in
1. A simple example is given for explanation, as shown in
Matrix multiplication takes very much time and occupies more memory, so the complexity will reach O(n3), if two matrices are just multiplied in no skill, in which n is margin length of matrix; thus, Sparse Matrix Multiplication (proposed in year 1996 by G. H. Golub and C. F. V. Loan, Matrix Computations. Johns Hopkins University Press, Baltimore, proposed in year 2002 by F. Y. Chang, A x=b in C++. sparse matrix technique. NCTU press) is used at the step of this invention for time saving; the step is matrix M=BB′, in which B=[I C], and thus
the model of unit matrix I is fixed, so only the matrix B is required to be saved and only CC′ is acceptable given from the matrix M.
Although overwhelming majority of operation time and memory occupation is saved, because digital VLSI circuit interconnect is very complicated, the matrix multiplication CC′ must still be improved. It is found in this invention that the matrix CC′ is very sparse because most of the values are 0, and thus if all elements are stored and multiplied, more memory and time will be consumed and wasted. Thus, the concept of sparse matrix multiplication is available in this invention to record non-zero elements in the matrix C, being described as follows:
1. Each non-zero element as a row vector is recorded sequentially.
2. Each non-zero element as a column vector is recorded sequentially.
3. Each non-zero element as a value is recorded sequentially.
Other zero elements are not recorded. As shown in
After the clock shift is transformed into the quadratic equation, at step 105, in the conjugate gradient algorithm followed by a series of deduction processes and the Lagrange multiplier, equation (17) is given. In equation (17), the form of Lagrange multiplier λ=2
The conjugate gradient algorithm is a renowned algorithm in scientific calculation, and the Krylov subspace iteration algorithm is developed from this algorithm. The conjugate gradient algorithm was proposed by Hestenes & Stiefel in 1952 at the earliest days. CG algorithm is used to solve a symmetrical system in positive definite form quite effectively, and in the method of iteration, several rows of direction vectors are generated to update an iteration solution and an remainder vector. Although the generated arrays becomes larger and larger, only a small number of vectors are stored. Thus, the iteration of the conjugate gradient algorithm is used in this invention to obtain the Lagrange multiplier λ=2
Basically, in the conjugate gradient algorithm, a conjugate direction is applied for the concept of search direction, and a proper value s, called a step size, after many times of iteration, is used for rapid convergence.
Finally, at step 6 shown in
As shown in
If the obtained skew does not meet any threshold condition, then the skew s is out of the safe range. At this time, fine tune must be made for the ideal skew g. If the obtained skew s is higher than the upper bound of safe range, the ideal clock g is lowered; contrarily, if the obtained skew s is lower than the lower bound of safe range, the ideal clock g is heightened. The volume of ideal clock g is adjusted depending on the margin between the skew s and the boundary value of safe range, indicating when that the skew s is much far from the boundary value of safe range, larger adjustment of the ideal clock g is made; contrarily, when that the skew s is less far from the boundary value of safe range, smaller adjustment of the ideal clock g is made. Adjustment is thus made by degree until the obtained skew s meets the threshold condition, and the iteration stops.
Operation platforms used in this invention are SUN Blade 2000 as a workstation, Solaris 8 as an operating system, Ansi C for design, and a gcc compiler. ISCAS'89 is applied as a testing circuit. A smaller circuit is herein used to generate an output file as an example after the operation is performed in this invention, and the output file contains the total amounts of circuit logic gates, the number of registers, the number of local data paths, clock periods, and feasible solutions to all the skews.
Line 1 indicates a circuit name, r, p, Spanning, Clock, CG#, Residual, and Run Time in lines 2˜9 indicate the number of registers, the number of local data paths, the number of spanning tree nodes, clock period, number of times of iteration in the conjugate gradient algorithm,.residual error, number of times for calculation of the skew s, and running time of the program, respectively, and line 9 indicates running time of the program in the conventional technology, in which * stands for the minimum.
Zero Skew is compared with Non-Zero Skew for the reliability of a circuit, two ISCAS'89 testing circuits, s838.1 and s5378, are determined to be examples, analysis results are shown in
While the invention has been described in terms of what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention needs not be limited to the disclosed embodiment. On the contrary, it is intended to cover various modifications and similar arrangements included within the spirit and scope of the appended claims which are to be accorded with the broadest interpretation so as to encompass all such modifications and similar structures.