1. Field of the Invention
This invention relates to the field of data processing systems. More particularly, this invention relates to the control of an instruction prefetch unit for fetching program instructions to an instruction queue from within a pipelined memory system.
2. Description of the Prior Art
It is known to provide data processing systems having a prefetch unit operable to fetch program instructions from a pipelined memory system, whether that be an L1 cache, a TCM or some other memory, and supply these fetched program instructions into an instruction queue where they are buffered and ordered prior to being issued to a data processing unit, such as a processor core, for execution. In order to improve memory fetch performance, it is known to utilise pipelined memory systems in which multiple memory accesses can be in progress at any given time. Thus, a prefetch unit may initiate a memory access fetch on one cycle with the data corresponding to that memory access fetch being returned several cycles later. Within a data processing system in which changes in program instruction flow, such as branches, are not identified until after the program instructions are actually returned from the memory, then it is possible that several undesired memory access fetches would have been initiated to follow on from the branch instruction and which are not required since the branch instruction will redirect program flow elsewhere. It can also be the case that exceptions or interrupts can arise during program execution resulting in a change in program flow such that memory access fetches already underway are not required. A significant amount of energy is consumed by such unwanted memory access fetches and this is disadvantageous.
Viewed from one aspect the present invention provides a data processing apparatus comprising:
a prefetch unit operable to fetch program instructions from a pipelined memory system;
an instruction queue unit operable to receive program instructions from said prefetch unit and to maintain an instruction queue of program instructions to be passed to a data processing unit for execution; and
a fetch rate controller coupled to said instruction queue unit and responsive to program instructions queued within said instruction queue to generate a fetch rate control signal; wherein
said prefetch unit is responsive to said fetch rate control signal generated by said fetch rate controller to select one of a plurality of target fetch rates for program instructions to be fetched from said pipelined memory system by said prefetch unit, said plurality of target fetch rates including at least two different non-zero target fetch rates.
The present technique recognises that energy is being wasted by performing memory access fetches which will not be required due to changes in program instruction flow. Furthermore, the present technique seeks to reduce this waste of energy by dynamically controlling the fetch rate of the prefetch unit in dependence upon the instructions currently held within the instruction queue. In many cases, the maximum fetch rate is not needed since the instructions will not be issued from the instruction queue to the data processing unit at a rate which needs the maximum fetch rate in order to avoid underflow within the instruction queue. Accordingly, a lower fetch rate may be employed and this reduces the likelihood of memory access fetches being in progress when changes in program instruction flow occur rendering those memory access fetches unwanted. This reduces energy consumption whilst not impacting the overall level of performance since instructions are present within the instruction queue to be issued to the data processing unit when the data processing unit is ready to accept those instructions.
A secondary effect of reducing the number of memory access fetches which are not required is that the probability of cache misses is reduced and accordingly the performance penalties of cache misses can be at least partially reduced.
Whilst it will be appreciated that the fetch rate may be controlled in a wide variety of different ways in dependence upon the program instructions currently stored within the instruction queue, there is a balance between the sophistication and consequent overhead associated with the circuitry for performing this control weighed against the benefit to be gained from more accurate or sophisticated control.
In some simple embodiments of the present technique the fetch rate may be controlled simply in dependence upon how many program instructions are currently queued.
A more sophisticated approach, which is particularly well suited to being matched with the number of stages within the pipelined memory system, is one in which a plurality of occupancy ranges are defined within the instruction queue with the fetch rate being dependent upon which occupancy range currently corresponds to the number of instructions currently queued.
This occupancy range approach is well suited to dynamic adjustment of the control mechanism itself, e.g. underflows of program instructions resulting in a shift in the boundary between occupancy ranges resulting in a tendency to speed up the fetch rate or overflows of the instruction queue shifting the boundaries to result in an overall lowering of the fetch rate.
A more sophisticated and complex control arrangement is one in which the fetch rate controller at least partially decodes at least some of the program instructions within the instruction queue to identify those instructions and accordingly estimate the number of processing cycles which the data processing unit will require to execute those instructions. Thus, an estimate of the total number of processing cycles required to execute the program instructions currently held within the instruction queue may be obtained and this used to control the program instruction fetch rate.
Within some data processing systems multiple program instruction sets are supported and these program instruction sets can have different instruction sizes. In such systems a given fetch from the pipelined memory system may contain a higher number of program instructions if those program instructions are shorter in the length. Accordingly, the fetch rate controller is desirably responsive in at least some embodiments to the currently selected instruction set so that the fetch rate control signal can be adjusted depending upon the currently selected instruction set.
As previously discussed, when a taken branch instruction is encountered this will result in a change in program flow. The present technique helps reduce wasted energy due to unwanted memory access fetches being performed to locations no longer on that program flow. The technique can be further enhanced in at least some embodiments by increasing the fetch rate for a predetermined number of memory access cycles following a taken branch instruction so as to make up for the jump in program flow and refill the instruction queue with a pending workload of program instructions.
The prefetch unit can respond to the fetch rate control signal in a variety of different ways to adjust the overall fetch rate achieved. Particular embodiments are such that the fetch rate control signal controls the prefetch unit to either fetch or not fetch on each memory access cycle with the ratio between memory access cycles when a fetch is or is not performed being dependent upon the fetch rate control signal. Thus, the duty cycle of the prefetch unit is effectively controlled based upon the fetch rate control signal.
Within a two-stage pipelined memory system, a particularly advantageous control mechanism which provides a good degree of energy saving with a relatively low degree of control complexity if one employing fast, medium and low fetch rate control signals, such as may be generated in dependence upon occupancy ranges of the instruction due as previously discussed.
Viewed from another aspect the present invention provides a method of processing data comprising:
fetching program instructions from a pipelined memory system;
receiving said program instructions from said memory and maintaining an instruction queue of program instructions;
in response to program instructions queued within said instruction queue generating a fetch rate control signal; and
in response to said fetch rate control signal selecting one of a plurality of target fetch rates for program instructions to be fetched from said pipelined memory system, said plurality of target fetch rates including at least two different non-zero target fetch rates.
Viewed from a further aspect the present invention provides a data processing apparatus comprising:
a prefetch means for fetching program instructions from a pipelined memory system;
an instruction queue means for receiving program instructions from said prefetch unit and for maintaining an instruction queue of program instructions to be passed to a data processing unit for execution; and
a fetch rate controller means coupled to said instruction queue unit and responsive to program instructions queued within said instruction queue for generating a fetch rate control signal; wherein
said prefetch means is responsive to said fetch rate control signal generated by said fetch rate controller to select one of a plurality of target fetch rates for program instructions to be fetched from said pipelined memory system by said prefetch unit, said plurality of target fetch rates including at least two different non-zero target fetch rates.
The above, and other objects, features and advantages of this invention will be apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings.
The program instructions emerging from the prefetch unit 4 are separated into separate program instructions which are passed to the data processing unit (not illustrated) when they emerge from the instruction queue 6. Whilst the program instructions are within the instruction queue 6, the fetch rate controller 8 analyses these queued program instructions to generate a fetch rate control signal which is applied to the prefetch unit 4. The fetch rate control signal is used by the prefetch unit 4 to determine the duty cycle of the fetch or don't fetch signal being applied to the incrementer 16 and accordingly the fetch rate of program instructions from the pipeline memory system 2. The analysis and control of the fetch rate control signal can take a variety of different forms and may also be responsive to a currently selected instruction set within a system supporting multiple instruction sets of different program instruction sizes as well as upon identification of a taken branch instruction by the prefetch unit 4. These control techniques will be discussed below.
O - empty stage
F—fetch
The boundaries between the occupancy ranges illustrated in
A Verilog description of the fetch rate controller 8 required to produce the functionality described above (or at least been a major part thereof) is given in the following:
It will be appreciated that the operation of the processes of
It will be appreciated that whilst additional control complexity may be necessary to perform such partial decoding, this technique recognises that some program instructions take longer to execute than others and that simply estimating that all program instructions take the same number of processing cycles to execute is inaccurate. The relative benefit between the extra accuracy achieved and the extra control complexity required will vary depending upon the particular intended application and in some applications the extra complexity may not be worthwhile.
Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims.