The present invention is generally related to the field of processor simulation and, more particularly, to a two-phase processor power consumption simulation method and a system for implementing the method.
Power wall has become a critical issue for modern electronic system designs, as exemplified by the insistently reduced power budget and ever more functional components of portable electronic devices. Therefore, reducing the power consumptions of the electric components therein is one of the necessary approaches for achieving the above purpose. The power consumption of the processor, generally referring to CPU, logical chip, or other processing apparatus with processing ability, is emphasized. The industries are attempted to modify the circuits within the processor to lower the power consumption of the processor.
In early days, the system designer needs to implement the whole processor for testing the power consumption. If the result of the test does not meet the anticipation, the system designer will modify the layout of the components or the architecture within the processor again and again, for providing a processor with lower power consumption. However, every time the system designer modifies the processor, a big amount of additional costs is accompanied. Consequently, a method for simulating the execution of a processor has been provided in prior arts, for providing the prediction of the power consumption before the finish of the processor's implementation. Whereby, the power consumption result may be acquired during the design stage, for facilitating giving further modifications as early as possible. A fast and accurate system-level power estimation tool is essential for effective design space exploration. However, the system-level processor power simulation tool can not provide both fast and accurate result of simulation.
Processor power estimation has been studied for many years. For example, an instruction level power analysis (ILPA) model has been provided. However, it cannot achieve pipeline-accurate power estimation due to the lack of detailed pipeline power information.
For better accuracy, several works have proposed an architecture level power analysis (ALPA) approach, which provides fine-grained simulation model for detailed simulation. However, the simulation speed is sacrificed. The simulation speed of the architecture level is usually more than 1,000 times slower than ILPA.
For faster power consumption evaluation of peripheral cores, Givargis et al. has proposed a trace-driven simulation technique. The main idea is similar to ILPA, i.e., they break the functionality of each core into several instructions and then characterize the power consumption of each instruction. For example, Reset, Enable_tx, Enable_rx, Send, and Receive are the selected instructions for universal asynchronous receiver and transmitter (UART). The problem with this approach is that instruction traces are generated by functional models without timing information. Hence, timing-sensitive events, such as interrupts, may result in incorrect results.
All in all, the dilemma is that a fine-grained model is required for accurate power estimation; however, the simulation speed will be conceivably poor. On the other hand, coarse-grained simulation model, although fast, generates insufficient states to support accurate power calculation.
Consequently, the embodiments of the present invention provide a processor power consumption simulation method and a system of the same, for amending the above-mentioned conditions.
In one aspect of the embodiments of the present invention, a method for simulating processor power consumption is provided. The method comprises: simulating a simulated processor by a simulation module; utilizing a power analysis model to analyze the simulated processor's execution of at least one fragment of a program, for generating power analysis of a plurality of basic blocks of the at least one fragment by a analysis module; computing at least one power correction factor between the plurality of basic blocks by a correction module; utilizing a processing apparatus to generate a simulation model with power annotation based on the power analysis and the at least one power correction factor by a annotation module; and predicting power consumption of the simulated processor based on the simulation model with power annotation by a prediction module.
In another aspect of the embodiments of the present invention, a storage medium readable by a processor, storing instructions executable by the processor to perform a method for simulating processor power consumption is provided. The method comprises the above-mentioned steps.
In still another aspect of the embodiments of the present invention, a software product tangibly embedded in a computer readable storage medium for simulating processor power consumption is provided. The software product comprises instructions operable to cause a processing apparatus to perform the above-mentioned steps.
In further another aspect of the embodiments of the present invention, a system for simulating processor power consumption is provided. The system comprises: a control module; a simulation module, coupled to the control module, for simulating a simulated processor; an analysis module, coupled to the control module, for utilizing a power analysis model to analyze the simulated processor's execution of at least one fragment of a program and generate power analysis of a plurality of basic blocks of the at least one fragment; a correction module, coupled to the control module, for computing at least one power correction factor between the plurality of basic blocks; an annotation module, coupled to the control module, for generating a simulation model with power annotation based on the power analysis and the at least one power correction factor; and a prediction module, coupled to the control module, for predicting power consumption of the simulated processor based on the simulation model with power annotation.
Utilizing the method and system provided by the embodiments of the present invention, the electronic system designers may trace the processor power consumption issue as soon as possible when executing software, which is beneficial for effective design space exploration.
In the embodiments of the present invention, a method for simulating processor power consumption is provided. For achieving the method, a system for simulating processor power consumption is also provided in the embodiments of the present invention. A programmable computer can be utilized to implement the system. For example, a hardware apparatus for implementing an embodiment of the present invention is shown in
Utilizing the system 200 mentioned above, a method 300 for simulating processor power consumption can be provided, as shown in
For example,
In
In the other hand, if the target branch is mis-predicted, the pipeline has to be flushed to clean up pre-fetched instructions, shown in
In one embodiment of the present invention, when executed independently, the basic block B 620 may consume 24 units of power, the basic block C 630 may consume 20 units of power, and the basic block D 640 may consume 15 units of power. Basic block B 620 may comprise a branch instruction “i4”. The consecutive execution of predicted basic block B 620 to C 630 may cost additional 2 units of power while the mis-predicted B to C branch costs additional 3 units of power. Therefore, the power correction factor on the branch is (2, 3), as shown in
The implementation of the other correction factors shown in
Likewise, extra powers are needed for the pipeline stalls or freezes caused by cache miss. In general, the pipeline behaves differently when data/instruction cache misses or hits, depending on the pipeline architecture. In some embodiments of the present invention, the cache miss penalty power correction is also considered. Take the OR1200 RISC processor as an example. When an instruction cache miss occurs and a load/store instruction is progressing at execution stage with data cache hit, then an NOP (i.e. pipeline stall) is inserted to keep pipeline progressing; in contrast, when a data cache miss occurs, the pipeline will be frozen. Nevertheless, only at runtime whether it will cause pipeline stall or freeze and affect processor power consumption. Yet, in practice the per-cycle power consumption of stalling or freezing can be pre-characterized. Hence, once the number of cycles stalled or frozen is known at runtime, the additional power consumption caused by cache misses can easily be calculated. In the embodiments of the present invention, the above-mentioned extra power consumption is acquired by utilizing the correction module 240.
The determine the number of stalled cycles due to cache miss latency, many models can be applied for this purpose. For example, CACTI is a possible memory model, and the counter approach proposed by Atitallah et al. is another possibility. The cycle count accurate memory model proposed by Yi-Len Lo et al. is still another candidate, which is utilized in the preferred embodiments of the present invention. Further, counting cache access latency dynamically is also utilized. Thus, the per cycle energy consumption of freeze and stall may be pre-characterized and the number of stall and freeze cycles at runtime may be counted.
In one embodiment of the present invention, an open source 32-bit RISC processor OR1200 is adopted, a gate-level power estimation tool PrimePower is used for power characterization, and a static compilation technique is adopted for instruction set simulation (ISS) implementation. The test cases of the benchmark are mainly from OpenRISC project at OpenCores organization, and tested on a host machine with Intel Xeon 3.4 GHz quad-core and 2 GB RAM.
For accuracy comparison, in another embodiment of the present invention, the benchmark test with the example, ALPA, and ILPA on the same set of test cases. The test cases comprise “basic”, “cbasic”, “mul”, and “dhry”. As shown in
Using the detailed gate level power analysis tool PrimePower as a golden reference, further comparison of the examples with and without power correction factors considering ideal cache is provided, as shown in
In another embodiment of the present invention, a direct mapped cache is adopted for considering cache misses. In this embodiment, it can be observed that the average error rate is more than 14% without cache miss corrections. Noticeably, the error rate of the basic test case is higher than others. This is because it contains no loop structure and hence caches misses occur frequently.
In some embodiments of the present invention, a storage medium readable by a processor, storing instructions executable by the processor to perform a method for simulating processor power consumption is provided. The method comprises the above-mentioned steps.
In some other embodiments of the present invention, a software product tangibly embedded in a computer readable storage medium for simulating processor power consumption is provided. The software product comprises instructions operable to cause a processing apparatus to perform a method for simulating processor power consumption. The method comprises the above-mentioned steps.
One advantage of the embodiments of the present invention is that a two-phase simulation method is utilized. A relative more accurate power analysis model, such as a gate level power analysis model, is utilized to analyze one fragment of a target program, for acquiring the power analysis of its basic blocks and the power correction factor between the basic blocks. A simulation model with relative faster simulation speed is then utilized to simulate with the mentioned power analysis and the power correction factor, whereby the problems corresponding to low simulation speed of a fine-grained power analysis model and the poor accuracy of the coarse-grained simulation model existed in the prior art can thus be amended.
Another advantage of the embodiments of the present invention is that effects of pipeline, branch, and/or cache miss are considered. Thus, the method and system provided by the present invention can apply to processor simulation model with more complicated architecture. The improvement of the embodiments of the present invention is not obvious to the prior art and the effect is supported by the experimental data.
Further another advantage of the embodiment of the present invention is that the fragments of a program, such as loop structures, which are repeated frequently can be fast computed utilizing the model with power annotation, and thus further detailed power analysis can be avoided without needs of time-consuming re-calculation as in the conventional power simulators.
Through the detailed description above, the spirit and features should be thoroughly understood by the ordinary skill in the art. However, the details in the embodiments are only for examples and explanation. The ordinary skill in the art may make any modifications according to the teaching and suggestion of the embodiments of the present invention, for meeting the various situations, and they should be viewed as in the scope of the present invention without departing the spirit of the present invention. The scope of the present invention should be defined by the following claims and the equivalents.