The invention relates to a method and apparatus for tuning the performance of a digital system such as an IP block or a system on chip (SoC), and in particular to a method and apparatus for tuning the performance of a digital system for best execution according to a particular application.
There is a continual drive to improve the hardware design of digital systems to obtain the best possible performance in terms of speed, power consumption, error free operation, and so on. In addition to improving the actual hardware designs of digital systems, there is also a continual drive to improve the performance of any given digital system by changing its operating parameters. For example, it is known to change the operating parameters of digital systems such that they operate at the fastest possible frequency and/or with the lowest possible power consumption, depending on the desired performance for a given application.
Techniques have been developed to adapt the performance of a digital system, for example an isolated IP bock or SoC, such that a certain level of performance is guaranteed both in terms of speed and power in some optimal way depending on a particular application.
In
This technique provides a tuning scheme aimed at optimising the performance of an IP block or SoC in real time. The technique determines the optimal power supply (Vdd), threshold voltage (Vb) and clock frequency (f) for a given desired performance in terms of speed and/or power consumption.
Modern digital systems are also facing more and more problems relating to slow interconnect, excessive power demands and complex system composability. These problems have resulted in the concept of partitioning a digital system into islands (ie a group of IPs), each of which is internally synchronous and independent from the rest of the system. In this way the system becomes asynchronous. The performance of each partition or island can be tuned as mentioned above to provide an optimum performance for a given application. While such techniques are advantageous for achieving the desired performance in terms of speed and/or power consumption, the techniques can have detrimental consequences for data throughput and/or data latency of a digital system.
The aim the present invention is to provide a method and apparatus for tuning the performance of a digital system, without having the disadvantages mentioned above.
According to a first aspect of the present invention there is provided a method of tuning the performance of a digital system. The method comprises the steps of receiving one or more performance indicators relating to the performance of the digital system, and tuning the frequency, supply voltage and/or transistor threshold voltage of the digital system to obtain a desired performance. The method also comprises the step of thereafter adjusting the pipeline depth of the digital system to fine tune the performance of the digital system.
The invention has the advantage of being able to provide an initial tuning step in accordance with performance indicators provided to obtain a desired level of performance, with a pipeline depth adjustment provided for fine tuning the performance of the digital system.
According to another aspect of the invention, there is provided an apparatus for tuning the performance of a digital system. The apparatus comprises means for receiving one or more performance indicators relating to the performance of the digital system, and tuning means for tuning the frequency, supply voltage and/or transistor threshold voltage of the digital system to obtain a desired performance. The apparatus also comprises pipeline configuration means for adjusting the pipeline depth of the digital system after the tuning means has tuned the digital system, thereby fine tuning the performance of the digital system.
For a better understanding of the present invention, and to show more clearly how it may be carried into effect, reference will now be made, by way of example only, to the following drawings in which:
However, in accordance with the present invention, the digital system 1 also comprises pipeline configuration means 8 for configuring the pipeline depth of the digital system 1. The system also comprises selecting means 10 for selecting the frequency (f), supply voltage (Vdd), transistor threshold voltage (Vb) and pipeline depth (Pd) of the digital system being tuned. The selecting means 10 is configured to select the frequency (f), supply voltage (Vdd), transistor threshold voltage (Vb) and pipeline depth (Pd) of the digital system in accordance with the performance indicators received for a given application, as will be described in greater detail below.
Thus, according to the invention, the tuning involves adjusting the pipeline depth (Pd), in addition to tuning the frequency (f), supply voltage (Vdd) and/or the transistor threshold voltage (Vb) of the digital system 1. In this way the adjustment of the pipeline depth acts as a means of fine tuning the digital system, after the digital system has been tuned in terms of the frequency (f), supply voltage (Vdd) and/or the transistor threshold voltage (Vb).
The selecting means 10 can be configured to determine the best possible pipeline depth for any given frequency in order to optimise throughput, latency, or a compromise or average of throughput and latency. Alternatively, the selecting means 10 can be configured to determine a range of possible pipeline depths for any given frequency. This is because the frequency provides a hard constraint on the pipeline depth in terms of maximum delay between two stages in the pipeline. The power supply (Vdd) and the transistor threshold voltage (Vb) also alter the delay and, in this sense, they also influence this hard constraint. It will be appreciated that this is an upper delay constraint, but smaller delays (corresponding to deeper pipelines) are allowed, and this will depend solely on the performance indicator received from the software.
The selecting means 10 can be configured to determine the pipeline depth on-the-fly. In other words, the selecting means 10 can be configured to dynamically determine the pipeline depth in response to the performance indicator or indicators received from the software. Alternatively, the selecting means 10 can be configured to select a pipeline depth based on pre-calculated values stored in a look-up table. With the latter, the look-up table comprises a list of pipeline depths required to provide a certain throughput or latency for different combinations of frequency (f), supply voltage (Vdd) and/or transistor threshold voltage (Vb).
The step of configuring the pipeline involves changing the depth of the pipeline. The depth of the pipeline can be changed by skipping one or more register banks separating pipeline stages in the digital system. This allows performance to be changed in terms of data throughput or data latency depending on the particular application. As will be appreciated by a person skilled in the art, the throughput of a pipeline is the measure of how often an instruction exits the pipeline, ie the number of instructions completed per second. In contrast, pipeline latency relates to how long it takes to execute a single instruction in the pipeline.
Although it is known to change the depth of a pipeline per se, the depth is normally changed to reduce frequency, which in turn reduces power consumption. This has limited advantages in isolation. The present invention differs in that the system is first tuned in terms of supply voltage (Vdd), frequency (f) and/or transistor threshold voltage (Vb), for example to reduce power consumption, but with a further adjustment made to adjust the pipeline depth. In other words, the tuning of the supply voltage (Vdd), frequency (f) and transistor threshold voltage (Vb) for reduced power consumption will have the side effect of reducing the overall performance of the system, which is then compensated by tuning the pipeline depth to improve performance, ie either for data throughput or data latency optimisation.
If the noise is below the maximum level, the controller moves to state 53, where a pipeline check is performed. Here the performance indicator is translated into the triple (pipeline depth, frequency, supply) which minimises power and is easier to reach (local maximum with minimum state distance where locality is determined by the delay in the changes of supply and frequency and a design constraint on how long it should take to reach the new triple). The triple is then imposed on the system by means of the delay loop (54) and supply loop (55). These loops are not independent as there is an order in which the pipeline depth, supply voltage and clock frequency must be changed. For example, preferably the frequency should not be increased until the power supply has been increased. Also, a decrease in power supply should preferably be preceded by a frequency decrease. It will be appreciated that the change in transistor threshold voltage (Vb) can be hidden in the power, speed and noise actions.
The controller is such that, even without changing the performance indicator, it might change the triple (pipeline depth, frequency, supply) due to the fact that it always pursues a constrained local minimum. This may occur, for example, when the values of power supply (Vdd), frequency (f), transistor threshold voltage (Vb) and pipeline depth (Pd) are changed because of changes in environmental conditions, such as temperature.
Although the preferred embodiment refers to the digital system being an IP block or SoC, it will be appreciated that the digital system may be any form of integrated circuit, including integrated circuits partitioned into separate regions or islands.
Furthermore, although the performance indicators are described as being communicated from the software to the hardware in the form of dedicated instructions, it will be appreciated that the performance indicators can be provided in other ways.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim, “a” or “an” does not exclude a plurality, and a single processor or other unit may fulfill the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.
Number | Date | Country | Kind |
---|---|---|---|
05100153.5 | Jan 2005 | EP | regional |
PCT/IB2006/050083 | Jan 2006 | IB | international |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB06/50083 | 1/10/2006 | WO | 00 | 6/16/2010 |