The present invention is related to optimising the configuration of re-configurable logic devices by providing an apparatus and method for optimising a hardware design implemented on a programmable architecture.
Certain reconfigurable devices/fabrics are commonly constructed from multiple instances of a single user programmable logic tile. These tiles represent the fundamental building blocks of every logic circuit which is designed using that particular reconfigurable device/fabric.
One of these tiles typically comprises registers associated with logic elements such as Arithmetic Logic Units (ALUs) or multiplexers. In order to perform a specific function, these tiles must be interconnected in a specific way. The information related to how these tiles are interconnected is found in what is known as a netlist.
In order to maximise the performance of a design mapped onto a reconfigurable device/fabric, it is important to find the optimal location of each register cell such that the longest path between any two registers is minimised. This technique of moving the structural location of latches or registers in a digital circuit in order to improve its performance, area, and/or power characteristics is known as “retiming”. There are several known approaches to retiming, most of which are based on the use of a retiming algorithm or weighted retiming function.
A first approach to retiming consists of using a retiming algorithm during the synthesis stage of development. At this point, because the netlist has not yet been placed onto the device/fabric, the interconnection delay must first be estimated using a mathematical model. The lengths of the paths between the registers are then calculated using the measured delays of each logic cell and the estimated interconnection delays. Finally, these lengths are used by the retiming algorithm to place the elements onto the fabric. This technique suffers from being entirely dependent on the accuracy of the model used to estimate the interconnection delays. An inefficient or incorrect model can cause the algorithm to choose an inefficient design.
In order to provide a solution to this problem, a technique was developed which involved performing the retiming during the placement stage of the circuit's design. This approach sees the retiming algorithm being executed after each placement iteration, at which point the registers can be rearranged to optimise the paths therebetween. One significant advantage of this technique is that the model used to determine the interconnection delay may incorporate into its calculations a certain amount of placement information, data which would not have been available at the synthesis stage. Thus, although this model will still partially rely on an estimate of the routing delay, it will be more accurate than a model used in the synthesis stage of development. The retiming algorithm that uses this new model will, however, be constrained by the fact that the new arrangement of registers may not be easily placeable, thereby increasing the possibility of invalidating the optimal placement solution found during any one iteration.
In order to increase the accuracy of the retiming process further, a technique has been developed where the register retiming is performed during the routing stage. In this scenario, all actual routing information is known in that the paths between the registers are fixed. Accordingly, this technique does not require the use of a model in order to determine the interconnection delay. At this stage, however, because the register placement cannot be modified, retiming will have little or no impact on the performance of the circuit.
Thus, each of the above techniques suffers particular disadvantages. Although the techniques of retiming during the earlier stages of development provide greater flexibility in terms of register position, they also suffer from having to use approximate timing models. Conversely, retiming at a later stage provides more accurate timing data but limited flexibility due to the difficulties associated with repositioning registers.
Accordingly, there is a clear need for a new method of retiming which provides a high level of timing accuracy and the flexibility to change routing paths after the placement phase.
In order to solve the above problems, the present invention provides a method of minimising the longest delay path between two logic elements of a circuit placed on a reconfigurable device, each logic element being associated with a register and the reconfigurable device including logic elements and associated registers which are programmed to be transparent, the method comprises the steps of:
determining a number of possible routing paths for connecting the two logic elements of the circuit through a specific register associated with one of the logic elements, including at least one path which passes through at least one register which is programmed to be transparent;
selecting a routing path based on at least one routing path criterion including whether each routing path passes through a register which is programmed to be transparent;
calculating, for each respective transparent register through which the selected path is routed, by how much the longest delay between the two logic elements would be reduced by activating the respective transparent register and programming the specific register to be transparent;
determining, based on the results of the calculating step, which, if any, transparent register would maximise the reduction in the longest delay; and
if a transparent register was determined in the determining step, programming the determined transparent register to be active and programming the specific register to be transparent.
Preferably, the at least one routing path criterion further includes the overall delay of each path.
Preferably, the at least one routing path criterion further includes the congestion of each routing path.
Preferably, the method further comprises the step of:
setting the maximum frequency of the circuit based on the maximum delay path.
Preferably, the step of programming the determined transparent register to be active and programming the specific register to be transparent comprises the steps of:
configuring the specific transparent register as a route-through register; and
configuring the determined transparent register as a clocked register.
The present invention further provides an apparatus for minimising the longest delay path between two logic elements of a circuit placed on a reconfigurable device, each logic element being associated with a register and the reconfigurable device including logic elements and associated registers which are programmed to be transparent, the apparatus comprises:
path determining means for determining a number of possible routing paths for connecting the two logic elements of the circuit through a specific register associated with one of the logic elements, including at least one path which passes through at least one register which is programmed to be transparent;
selecting means for selecting a routing path based on at least one routing path criterion, including whether each routing path passes through a register which is programmed to be transparent;
calculating means for calculating, for each respective transparent register through which the selected path is routed, by how much the longest delay between the two logic elements would be reduced by activating the respective transparent register and programming the specific register to be transparent;
transparent register determining means for determining, calculations made by the calculating means, which, if any, transparent register would maximise the reduction in the longest delay; and
programming means for, if a transparent register was determined by the transparent register determining means, programming the determined transparent register to be active and programming the specific register to be transparent.
Preferably, the at least one routing path criterion further includes the overall delay of each path.
Preferably, the at least one routing path criterion further includes the congestion of each routing path.
Preferably, the apparatus further comprises:
setting means for setting the maximum frequency of the circuit based on the maximum delay path.
Preferably, the programming means further comprise:
configuring means for configuring the specific transparent register as a route-through register; and
configuring means for configuring the determined transparent register as a clocked register.
The reconfigurable device may be a Field Programmable Gate Array (FPGA) circuit.
As will be appreciated, the present invention provides several advantages. For example, because the present invention provides a solution which can be implemented after the placement and routing phase, accurate timing and delay information will be available. The present invention does not involve the physical moving of registers. Instead, the method of the present invention effectively swaps the activation states of registers using their transparency flags. Therefore, because the present invention makes use of unused registers, retiming of the circuit can be accomplished with minimal disruption to the existing registers, thereby resulting in a circuit which has minimised longest delay paths. Accordingly, the present invention provides a retiming method and system which has increased flexibility and effectiveness, thereby resulting in more efficiently optimised logic circuits. These advantages will permit a circuit which has been designed in accordance with the method of the present invention to run at an increased maximum frequency.
An example of the present invention will now be described with reference to the accompanying drawings, in which:
With reference to
The path configuration in
The logic elements in this example can also be used as route-through resources. For ALUS, this is achieved by programming them with a “propagate input” function. For multiplexers, this is done by routing a constant signal to their selection input rather than using the input coming from the routing network.
Registers are situated on some of the output buses of the above elements. As can be seen from
Each register can be clocked or transparent. This state is specified by a configurable state holder known as the “transparency flag”. If the register is clocked, it behaves normally (i.e. it propagates the input value to the output value at every clock cycle). Dissimilarly, if a register is in transparency mode, it propagates the input value without clock latency (i.e. with only a small propagation delay). This means that registers can also be used as route-through resources.
A possible result of the placement stage is shown in
The next stage in the development process, and the first step in the method of the present invention, comprises routing the placed netlist. For most of the nets in the netlist, the solution to the routing problem is trivial. Accordingly, no congestion is found.
The path from RX to MY can pass through either inputs of the multiplexer M13 (and then through register RG13, which can be set as transparent) or be routed around the multiplexer block altogether. This is allowed because, as explained above, the logic elements and the registers can be used as route-through resources.
There are several criteria by which routing algorithms select a path. Typically, the selection depends on the delay across the paths (i.e. the lowest delay path is selected to maximize performance) and the number of congested wires (i.e. congestion is to be avoided). The method of the present invention provides a modified routing algorithm which makes use of a new, additional criterion for choosing optimal paths. The present invention further provides a router which implements the modified routing algorithm by selecting a path which, despite having a longer delay and providing no further benefit to wire congestion, passes through at least one transparent register that can be exploited in the retiming phase.
In the example of
A standard timing-based router would choose the solution that produces the least amount of delay, which is the path going around the multiplexer block. As shown on
In order to solve this problem, the method of the present invention makes use of the synergy between the modified routing algorithm and a retiming algorithm applied after the routing stage.
The router in accordance with the present invention first examines the various paths between RX to MY. In so doing, the router determines that the difference between these paths is localised in the route which stretches from switch o to switch β. As explained above, the example of
The router of the present invention then detects that the paths which go through the multiplexer block contain a pass-though register (i.e. register RG13). This detection step provides a significant advantage in that, although the paths which pass through the pass-through register may not be optimal in terms of timing, the transparent register RG13 within these paths may provide further advantages during the retiming phase.
The next step is to analyse the possible paths and determine the most convenient for routing the signal. Although the pass-through register situated on a path may be useful at some further point during the routing process, in some cases, the additional delay needed to reach the register will be too high. The router in accordance with the present invention therefore uses a criterion to decide whether to accept a relatively long path comprising one or more pass-through registers. This criterion could be any mathematical or logical criterion for example, a simple cost function (i.e. if the difference in delay between the shortest path and the shortest of the paths comprising at least on pass-through register is below a pre-defined threshold, the path is accepted). The threshold can be a fixed number or a “tolerance” (e.g. a percentage of a specific delay) which can be fixed by a user. The user can therefore decide how much delay he is willing to risk for the possibility of benefiting from the use of a pass-through register. Other factors, such as the number of pass-through registers a path may comprise, may also be factored into the analysis step.
An example of the above will now be described with reference to the example of
Dissimilarly, in the method of the present invention, performing a “move” of a register means swapping the “transparency” state holder of a pair of registers, so that one of them is “demoted” to being a transparent route-through register, and the other one is “promoted” to be a clocked register. Likewise, “inserting” a register means switching its “transparency” flag and promoting it to be a clocked register.
Thus, all transparent registers on the selected path are valid additional locations for use in the retiming algorithm. Because the method of the present invention comprises a step of specifically seeking out transparent registers which can be included in routing paths, the method of the present invention will, on average, have access to a wide range of options relating to which transparent registers it can use in subsequent retiming steps.
The next step of the method is that of calculating the optimal configuration of register locations that will preserve the functionality of the netlist and minimise the longest delay path in the netlist. As will be appreciated, the information which the algorithm uses to calculate the values in this step is accurate, having been extracted after the placement and routing phases. Moreover, the resistance and capacitance of the wires connecting the logic elements is known. Finally, the signal propagation delay through the cells is known, as it can, for example, be looked up in a hardware characterisation database. Accordingly, the delay across each path will be accurately determined rather than being estimated.
In the example of
As a result of executing the method of the present invention, the longest path is the one from RA, through MX, to RX. This route is 0.74 units in length. This reduced delay permits the maximum clock frequency of the design to be increased. This represents a 27% improvement in performance over the result of a basic routing step and a 13.5% improvement in performance over the result achieved with a standard routing algorithm.
Number | Date | Country | Kind |
---|---|---|---|
08102782.3 | Mar 2008 | EP | regional |
08102782.3 | Mar 2008 | GB | national |