One goal of circuit design and/or program coding is to optimize some aspect of a system with the goal of improving its quality. One such optimization approach, generally referred to as the retiming of a circuit, is a technique of moving the structural location of latches or registers in a digital circuit in order to improve performance, area, and/or power consumption in such a way that preserves behavior at the circuit's outputs. Automated techniques use a directed graph to represent the digital circuit under consideration, where the vertices of the graph represent asynchronous combinational blocks, and directed edges of the graph represent a series of synchronous registers or latches. Each vertex has a value corresponding to the delay through the combinatorial circuit. After constructing this representation, one can attempt to optimize the circuit by moving delay registers around from input to the output or vice versa.
The description below refers to the accompanying drawings, of which:
A system and method for optimizing code, such as in one example, a hardware description language (HDL) code representation of a functional element such as a circuit, is generated from a program model created within a high-level development environment. Tools are provided for retiming or other optimization of the model, such as applying register pipelining to achieve retiming.
In one or more present approaches, an additional functional constraint is introduced to the model-based retiming of the design. In particular, a functional equivalence constraint is introduced to the design synthesis and made a top priority constraint. The functional equivalence constraint provides that a modified model of a component has the same functionality as the original model. Once the functional equivalence constraint is satisfied, then other constraints such as pipeline retiming and so forth can then be applied to the model.
In some embodiments, the functional constraint analyzes three conditions for a component of a graph to determine if functionally equivalent retiming is possible for the component (such as by moving a register across the component). These can include:
states internal to the component having zero as an initial value {initVal=0}
zero input produces a zero output {f(0)=0)}
zero input does not result in changed internal states
The functional equivalence analysis can be implemented in several ways. For primitive components, the semantics may be well known in advance. Information concerning functional equivalence can be readily determined or even stored within the model for these components. For example, a simple logical component such as a gain amplifier block is known to accept a zero input, to provide zero output in response to a zero input, and to not have any state changes given zero at the input. Therefore a gain component can be marked to satisfy the functional equivalence condition. On the other hand, a component such as a counter will change its state in response to clock signals, and thus will not pass the functional equivalence test. In still other instances, a logical inverter may prevent zero output. More generally, the functional equivalence analysis can check if the component (1) for a given input value, m, produces the same output value, m, and (2) the component does not change its internal state.
For more complex blocks, an initial value propagation based test can check for state changes. The tools can apply input conditions to test the component to determine if an internal state changes with a zero input applied, and if not, will report that it is safe to move a delay block across. However, if states do change with an applied input, then a conclusion can be reached that is unsafe to move a delay register across the component. It can be sufficient to test such components for compliance with zero, non-zero, and unknown inputs, and exhaustive testing can be avoided.
In some embodiments, the set of conditions used by the functional equivalence analysis may be less than or differ from the three conditions listed above.
In some embodiments, a user-designed subsystem may be analyzed to determine if it includes any components that are known to violate the functional equivalence constraints. Any blocks that are unsafe to retime can be used as boundaries to define partitions within the subsystem that groups blocks together that are safe to move. The analysis can suggest to the user that the partitioned subset(s) of the system can be individually retimed. With this approach, a subsystem that would have failed retiming with prior approaches can now be successfully retimed.
In some embodiments, the system includes an Intermediate Representation (IR) generator, a functional equivalence analyzer, a partitioner, a scheduler, an optimizer/pipeline insertion engine, and a code generator such as an HDL code generator. The IR builder receives a high-level specification created by a user. The high-level program specification may be a graphical model, a Stateflow® chart, a MATLAB functions/files/scripts, a Simulink MATLAB block, C, C++, System C or other C-like code, Auto ESL, a Resistor Transistor Language (RTL) description such as VHSIC Hardware Description Language (VHDL), Verilog or the like. The IR builder may create one or more graphs or trees, such as a data flow graph (DFG), based on the high-level program specification. The DFG may include a plurality of interconnected nodes each corresponding to an operation.
The functional equivalence (FE) analyzer then scans the components of the DFG (i.e., the nodes of the DFG) to check for compliance with the necessary conditions for functional equivalence.
The partitioner may then optionally identify components of the DFG that do not pass the FE scan. These failing nodes can then be used as boundaries to partition the DFG into subsections that will individually pass the FE scan.
The scheduler then uses a scheduling algorithm to produce an optimized design for the nodes, or subsections, of the DFG that pass the FE testing. The optimized design, for example, may apply a further constraint such as register pipelining to minimize combinatorial latency for each such subsection of the DFG.
In some embodiments, failing components that do not pass the FE scan may nonetheless be retimed, such as when functional equivalence is deemed to be less important than retiming improvements.
The code generator may then operate on the optimized DFG to generate optimized code.
I. High Level System Overview
The main memory 104 stores a plurality of libraries or modules, such as an operating system 122, and one or more applications running on top of the operating system 122, including a technical computing environment 124. The main memory 104 may also include a code generation system 126. The code generation system 126 may be configured as a toolbox or an add-on product to the high-level technical computing environment 124. Furthermore, a user or developer may create and store a program specification 128 and a control file 130. The control file may be stored on disk or represented in the main memory 104.
The removable medium drive 110 is configured to accept and read a computer readable medium 132, such as a CD, DVD, floppy disk, solid state drive, tape, flash memory or other medium. The removable medium drive 110 may further be configured to write to the computer readable medium 130.
Suitable computer systems include personal computers (PCs), workstations, laptops, palm computers, smart phones, tables, virtual machines, and other data processing devices, etc. Those skilled in the art will understand that the computer system 100 of
Suitable operating systems 122 include the Windows series of operating systems from Microsoft Corp. of Redmond, Wash., the Linux operating system, the MAC OS® series of operating systems from Apple Inc. of Cupertino, Calif., and the UNIX® series of operating systems, among others.
As indicated above, a user or developer, such as an engineer, scientist, programmer, etc., may utilize the keyboard 116, the mouse 118 and the computer display 120 of the user I/O 106 to operate the high-level technical computing environment 124, and create the program specification 128 and the control file 130.
Suitable high-level technical computing environments for use with embodiments include the MATLAB® and SIMULINK® technical computing environments from The MathWorks, Inc. of Natick, Mass., the LabVIEW programming system from National Instruments Corp. of Austin, Tex., the Visual Engineering Environment (VEE) from Agilent Technologies, Inc. of Santa Clara, Calif., the Khoros development system from AccuS oft Corp. of Northborough, Mass., a C programming system, a JAVA programming system, and a C++ programming systems, other C environments, among still other environments. Those skilled in the art will recognize that the computer system 100 need not include any software development environment at all.
Those skilled in the art will understand that the MATLAB® technical computing environment is a math-oriented, textual programming environment for digital signal processing (DSP) design, among other uses. The SIMULINK® technical computing environment is a graphical, block-based environment for modeling and simulating dynamic systems, among other uses.
The code generation system 126 may include a plurality of components or modules. Specifically, the code generation system 126 may include an intermediate representation (IR) generator 203 that is configured to create one or more IRs from the program specification 128.
The code generation system 126 may also include an optimization engine 250 that comprises a functional equivalence analyzer 255, a partitioner 256, a scheduler 257, an optimizer/pipeline insertion engine 258, and a Hardware Description Language (HDL) code generator 260.
The IR generator 203, functional equivalence analyzer 255, partitioner 256, scheduler 257, and the HDL code generator 260 are functions that may each comprise registers and combinational logic configured and arranged to produce sequential logic circuits. In the illustrated embodiment, these functions are software modules or libraries containing program instructions pertaining to the methods described herein, that may be stored on computer readable media, such as computer readable medium 130, and executable by one or more processing elements, such as CPU 102. Other computer readable media may also be used to store and execute these program instructions. In alternative embodiments, various combinations of software and hardware, including firmware, may be utilized to implement the principals taught herein.
II. Functional Equivalence Analyzer 255
The functional equivalence analyzer 255 accepts input as to whether the user wishes to enforce certain suggested functional constraints prior to applying component retiming constraints. To understand how the functional equivalence analyzer operates, consider first the example schematic illustration of a logical system representing a digital circuit or program code to carry out a particular function shown in
The graphical model 300 and other graphical models discussed in this document are meant for illustrative purposes only, and those skilled in the art will recognize that other models, e.g., having different types or arrangements of blocks, etc., may be created by the user. For example, in one embodiment, one or more of the graphical blocks may represent a subsystem that further comprises a plurality of interconnected blocks and/or subsystems. In still other embodiments, the model may originate in other than graphical form, such as a textural model.
A. Example Functional Equivalence Test for Zero Initial Value
In the specific example of
The basic process of retiming the graphical model 300 can involve moving delay blocks, such as the z−1 delay block 303 in the data flow. In the example shown in
With the particular example in
The approach of some embodiments is to ensure that the transformed component model has the same functionality across clock cycles. This functional equivalence requirement can be applied as a constraint prior to applying retiming constraints.
One might consider that a possible solution here is presented by the circuit 330 of
In another situation, it may even be impossible to redesign the circuit as an equivalent initial state can be provided. Consider the example of
A more desirable solution enabled by some embodiments is to still allow for register re-timing by moving delays in the circuit around but by first applying functional equivalence as a top priority constraint. This approach provides that one does not introduce a different result by moving the delay blocks around. Once functional equivalence is confirmed, then the automated design tool can apply more constraints such as minimizing and/or reducing the retiming.
Two additional constraints are applied in some embodiments: that no additional logic should be introduced into the model, and that the functional equivalence method should be capable of being performed quickly.
A first property to check is to determine whether an initial condition of a retimed circuit provides an initial value of zero for internal states of the function y(t)=f(x(t)).
A second condition to test is whether a zero input to the retimed graph component produces a zero output value, e.g., {f(0)}=0.
<y(t),S(t)>=f(x(t),S(i<t))
This will typically be the case for component functions f(x(t)) 702 where the logic is not strictly combinatorial, e.g., where the component function 702 may assume different internal states S(t). The property to check for such a condition is
f(0,Sinit)=<0,Sinit>, if 0 is the initial value in the delay being moved
or in other words, the test is whether applying a logic 0 to the input does not change the component's 712 state.
1. whether the component has an initial value of zero for its internal state(s) {InitVal=0} (901);
2. whether applying zero at the input produces zero at the output {f(0)=0} (902); and
3. whether applying a 0 at the input does not change the component's state (the component could have conflicting internal states as long as the external state does not change) (903).
These conditions can be graphically depicted as in
If at least two or more of these tests are true, the component 1000 is a viable candidate for further optimization, such as retiming by moving the delay components 1001, 1002 backward or forward in the pipeline. However, in some embodiments, it can be concluded that retiming is not possible for a component 1000 if the test for one or more of the above conditions is negative, or unknown.
For example, a user may set code generation options so that the model may be optimized by implementing retimed pipelines (e.g. via the pipeline insertion engine 258) where multiple instructions or operations are overlapped in time. This may involve reconsidering the placement of registers to break up computation into multiple units and executing a scheduling algorithm to produce a revision to the original graph.
The FE analysis 255 can access the component model 1100 and examine a behavior of the model 1100. In some embodiments, the FE analysis 255 need not apply an exhaustive set of inputs, for example, inputs can be restricted, such as to zero, non-zero, and unknown input states. In some embodiments, output testing can also be restricted, such as for zero, non-zero, and unknown output states.
The first condition is the initial value zero test. This may be determined checked by examining the component model 1100 to determine its initial specified value(s). It should be understood that if the component model 1100 contains a further sub-graph with multiple elements that may specify initial states, then the initial states values of all of such sub-graph elements may be checked.
The second condition, that is, whether applying a zero at the input results in zero at the output, may be determined by exercising the component model 1100 and observing a response to a zero input.
Testing for compliance with three conditions may include knowing or analyzing the semantics of each component model 1100. A suitable initial value propagation process in
For components made up entirely of static circuit elements, the analysis results for the three tests may be known in advance. The known analysis results may be stored with a model of the component to expedite the analysis for possible retiming. For example, a simple gain block is known in advance to pass all three tests, since a gain block has an initial zero value, produces zero output with zero input, and does not assume internal logic states. However, for other components, such as a counter, it is known in advance that such components will fail the second and third condition (because, for example, a counter will possibly automatically change state and advance to a next value on a next clock cycle, regardless of input values).
The logic for FE analysis 255 may also automatically presume that a user-designed block will not pass the three tests and mark the user-designed block accordingly.
More generally, a CompRoughSemantics process may perform an initial value propagation analysis, using the component model 1100, to determine whether the component results in any state changes. An example of how that process performs this analysis for the third condition is described in connection with
However, with another example, that shown in
If the component being analyzed comprises one or more user-defined circuit blocks or functions, in some embodiments, FE analysis 255 may automatically presume that it will violate at least one of the three rules for functionally equivalent testing.
The FE analysis 255 may be relatively simple. The first two conditions are straightforward input and output checks of the component graph. The analysis for compliance with the third condition may also be simple. For example, if there are no state blocks in the graph then the third constraint may be presumed to pass. And if there are state blocks in the component under analysis, it may then be submitted to semantic value propagation testing. In addition, only zero, nonzero and unknown states can be applied in the semantic testing.
It is possible in some implementations that the InitVal and f(0) tests may use other values to determine functional equivalents. Thus, these tests may be generalized to InitVal=K and f(x)=x. However, it may become more difficult to design components for the generalized InitVal condition (in some embodiments, components assume a zero initial condition value). It can also be difficult to ensure components also provide a given output when given the same input—indeed the problem may become an NP hard problem to ensure operating conditions are guaranteed to produce an expected result other than zero output.
B. Formal Proof for More General Initial Conditions
The Parallel Intermediate Representation (PIR) mentioned above can be more formally defined as PIR=(V,E), where vεV and eεE are the nodes and edges of the PIR graph respectively. In the PIR, it can be assumed that every edge, e εE, has exactly one driving node, referred to as Src(e)εV and may fanout to a number of destination nodes, Dst(e) ⊂V.
For illustration of a more general case of non-zero input, referring back to the simple example of
In some embodiments, when a register is retimed, its initial value may also be changed to maintain functional equivalence. For example, in
Instead of determining the initial values for retimed registers, conditions may be is defined under which the initial values of retimed registers do not have to change and the circuit is still functionally equivalent. Based on this, this condition may be called functional equivalent retiming or FE retiming. The conditions for such safe retiming may be formally defined.
Theorem 1: For a node, vεV, let y(t)=F(x(t),s(t)) be its output function at time t, and s(t+1)=S(x(t),s(t)), be its state update function at time t; x(t) and s(t) are its input and internal state variables at time t. For a given constant m, v is said to be retiming-safe in m and a register can safely be moved from output to input (or vice-versa) if and only if it satisfies the following two conditions:
1) F(m, s(0))=m.
2) S(m, s(0))=s(0).
Proof: The conditions simply check that the initial values of v do not change when the input is m and that the output is also always m. Consider
The intuition behind the conditions is to define the steady-state of the circuit during retiming. For a given input value, m, if the circuit produces the same output value, m, and it does not change the internal state, then we say that it is retiming-safe in m. is Based on this, a retiming-safe sub-graph may be determined.
Lemma 1: If nodes u and v are connected back to back and both are retiming-safe in m, then the path through u and v is also retiming-safe in m.
Proof: Assume that u is at the input of v. Since they are both retiming-safe in m, they both satisfy F(m,s(0))=m. When u takes input m, it outputs m on the edge connecting the nodes; thus input to v is also m and thus it also outputs m. Therefore as an entity, the path of u followed by v satisfies F (m, s(0))=m. By the same logic, S(m, s(0))=s(0) for both nodes and thus for the path as a whole. Thus, the whole path is retiming-safe in m.
Theorem 2: Moving a delay with initial value of m in a graph does not change the graph's initial state, if all nodes of the graph are retiming-safe in m, and thus the graph is retiming-safe in m.
Proof: If all nodes in the graph are retiming-safe in m, and m is injected in to the graph's inputs, then all edges in the graph acquire a value of m. Thus, if a delay, whose initial value is also m, is moved arbitrarily across a node, it does not change the value on any edge in the graph and thus does not affect any of the initial state.
Thus, if a retiming-safe (in m) graph where all nodes are retiming-safe in m may be constructed, then retiming may be safely applied without having to re-compute initial state of retimed registers. However, in many instances it is practical to we use m=0. This is because most registers have an initial state of 0 and most arithmetic operations are retiming-safe when m=0, e.g., for addition and multiplication, F(0)=0 irrespective of their other properties.
III. Using FE Analysis 255 to Suggest Subsystem Partitioning (Partitioner 256)
Next is described the operation of the partitioner (element 256 in
In an example shown in
In some embodiments, a concept of fine partitioning is applied. In fine partitioning, partitions with components for which it is safe to move the delay across are created. In effect, offending blocks such as the offending block 1401 become boundaries that divide the subsystem 1400 into one or more new graphs that do exhibit functional equivalence and therefore can individually be retimed. In the example of
Fine partitioning permits improving subsystems 1400 that have only some components fail the FE test. Fine partitioning also reduces the retiming complexity as a graph with E edges and V nodes to (˜O(V·E·log E)). The approach may work on a smaller graph by excluding non-functional equivalent components, thus reducing time, as V and E become smaller.
In another example illustrated in
In one such redesign, the user may determine that an offending subsystem that does not pass the FE compliance test should be submitted to retiming anyway. This may be desirable when overriding design considerations indicate FE is not as important for a particular subsystem as other benefits to be obtained by retiming.
One implementation of an FE retiming algorithm thus has two phases: (1) partition the PIR to identify retiming-safe sub-graphs; (2) apply the traditional retiming algorithm on these sub-graphs. The following algorithm finds all retiming sub-graphs in PIR=(V,E) with a time complexity of O(|V|+|E|):
In the second phase, traditional retiming can be applied on all retiming-safe sub-graphs found in the first phase.
The FE retiming algorithm has several advantages: (a) it is fast, (b) the expensive retiming phase is performed on smaller sub-graphs, and (c) it does not need to compute equivalent initial state, (d) it does not require additional logic to compute initial state, and (e) it is capable of handling complex nodes. Since the PR has a finite set of node-types, retiming-safety may be defined for each node type thus making it easier to define retiming-safety for the graph. For user-defined functions, e.g., using MATLAB functions or Stateflow charts, constant-propagation may be used to check if the retiming-safety conditions are met.
IV. Example Model and Retimed Model
V. Further Considerations
As described herein, embodiments of the system and method can apply functional equivalence as a primary constraint in implementing a high-level design specification. In some embodiments, only if these functional equivalence constraints are met are further optimizations, such as retiming, applied.
While what has been described as an example is a way to generate an HDL description to be implemented in hardware such as a field programmable gate array or application specific integrated circuit, it should be understood that the same techniques can be used to generate other things, such as program code (such as C code) to be executed on a programmable processor, from a high level description.
Alternative embodiments may use various techniques to split a program for is execution on multi-core processors or to create a multi-threaded process or program from a single-threaded process or program.
The foregoing description has been directed to example embodiments. It will be apparent, however, that other variations and modifications may be made to the described embodiments, with the attainment of some or all of their advantages. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of this patent.
This application is a continuation in part of co-pending U.S. patent application Ser. No. 14/096,333 filed Dec. 4, 2013 entitled “Model-Based Retiming with Functional Equivalence Constraints” by Yongfeng Gu and Girish Venkataramani which in turn claims the benefit of U.S. Provisional Patent Application Ser. No. 61/733,255 filed on Dec. 4, 2012 entitled “Model-Based Optimization with Functional Equivalence Constraints” by Yongfeng Gu and Girish Venkataramani, and U.S. Provisional Patent Application No. 61/787,445 filed on Mar. 15, 2013 entitled “Model-Based Retiming with Functional Equivalence Constraints” by Yongfeng Gu and Girish Venkataramani, the entire contents of each of which are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
61733255 | Dec 2012 | US | |
61787445 | Mar 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14096333 | Dec 2013 | US |
Child | 14640239 | US |