The present invention will be described further, by way of example only, with reference to preferred embodiments thereof as illustrated in the accompanying drawings, in which;
The store buffer 20 stores write requests issued by the processor core prior to those requests being issued to the bus interface unit 40. In this way, the write requests may be received from the processor core and stored temporarily in the store buffer 20 to enable the processor core to continue its operations despite the write request not having yet been completed. It will be appreciated that this helps to decouple the operation of the processor core from that of the bus interface unit 40 in order to prevent the processor core from stalling which enables the processor core to operate more efficiently.
Similarly, the pre-load unit 30 can store pre-load requests issued by the processor core prior to these being issued to the bus interface unit 40. Once again, this enables the processor core to continue its operations even when the pre-load request have not yet been completed.
It will be appreciated that other buffers or units may be provided which can receive requests from a processor core or other data processing unit prior to issuing those request for execution to enable those units to operate as sufficiently as possible.
Once a request has been received by the store buffer 20 or the pre-load unit 30 then that unit will request that the bus interface unit 40 provides access to the AXI bus 50 by asserting a request signal on the lines 25 or 35 respectively.
In the event that there is currently no activity on the AXI bus 50 then the bus interface unit 40 will arbitrate between request signals provided by different units. Once the arbitration has been made, generally based on relative priorities assigned to requests from different units, an acknowledge signal is provided over the path 27 or 37, dependent on which unit is allocated access to the AXI bus 50. Should a unit be granted immediate access to the AXI bus 50 on receipt of a request then that request may be passed straight to the bus interface unit 40 without necessarily needing to be stored by that unit. However, it will be appreciated that it would also be possible to always store each request received by a unit and then indicate that the request has been issued and can be overwritten in the unit once it has been accepted by the bus interface unit 40. Accordingly, in the event that the AXI bus 50 is available immediately or shortly after each request has been received by the store buffer 20 then these requests are can be passed straight to the bus interface unit 40 for transmission over the AXI bus 50 without any optimization. Similarly, in the event that the AXI bus 50 is readily available then any pre-load instructions provided to the pre-load unit 30 may be rapidly forwarded to the bus interface unit 40 for transmission over the AXI bus 50 without modification.
To illustrate this, consider the following sequence of requests issued by the processor core to the store buffer 20 when the AXI bus 50 has high availability: STR@0; STR@0+8; and STB@0+1.
The store buffer 20 will assign the STR@0 request to slot 0. Then, the store buffer 20 will drain the STR@0 request to the bus interface unit 40. This will occur before the STR@0+8 request has been assigned to slot 1 and linked with slot 0. Then, the store buffer 20 will drain the STR@0+8 request to the bus interface unit 40. Following this, the STB@0+1 request is received by the store buffer 20. This will be assigned to slot 2 since the STR@0 request has already been drained and so there is no opportunity to merge these requests together in slot 1.
Accordingly, because the bus interface unit 40 accepts requests from the store buffer 20 straight away due to their being availability on the AXI bus 50, the link and merge features of the store buffer are not utilized. Accordingly, when the AXI bus 50 has high availability, it will also receive the requests STR@0, STR@0+8 and STB@0+1.
Similarly, in the event that the pre-load unit 30 receives the instructions PLDA, PLDB and PLDC then these instructions will each be provided to the pre-load unit 30 and drained quickly to the bus interface unit 40 for transmission over the AXI bus 50 before the next pre-load instruction is received by the pre-load unit 30. Accordingly, the AXI bus 50 also receives the instructions PLDA, PLDB and PLDC.
However, in the event that the availability of the AXI bus 50 is low, typically due to high levels of activity on the AXI bus 50 then optimization of the pending requests within the store buffer 20 and the pre-load unit 30 will occur.
Hence, if the same sequence of instructions mentioned above are provided to the store buffer 20 when the availability of the AXI bus 50 is low, the bus interface unit 40 will indicate to the store buffer 20 that the AXI bus 50 is unable to accept requests so the requests are then held in the store buffer 20 and the merge and link capabilities of the store buffer 20 can be utilized.
Accordingly, the instruction STR@0 is stored in slot 0. Then, the instruction STR@0+8 is received, stored in slot 1 and linked with slot 0. When the request STB@0+1 is received, this is then merged into slot 0.
Hence, when the bus interface unit 40 then indicates that the AXI bus 50 is able to receive requests, the store buffer 20 will send a request STM4@0 to the bus interface unit 40 for transmission over the AXI bus 50 in place of the three separate requests. It will be appreciated that the transmission of a single STM4 instruction rather than multiple STR or STB instructions provides for more efficient use of the AXI bus 50 when its availability is low.
Similarly, if the same sequence of instructions mentioned above are provided to the pre-load unit 30 when the availability of the AXI bus 50 is low, optimisation of the instructions can occur in the pre-load unit 30.
Accordingly, the pre-load unit 30 will receive the PLDA instruction and this will be stored therein. Thereafter, the PLDB instruction will be received and this will overwrite the PLDA instruction so that the PLDA instruction is disregarded. Then, if the PLDC instruction is received before the PLDB instruction is drained to the bus interface unit 40, this PLDC instruction will overwrite the PLDB instruction. Thereafter, the PLDC instruction will be drained to the bus interface unit 40 once access to the AXI bus 50 has been allocated to the pre-load unit 30.
Hence, it can be seen that pending pre-load instructions are dropped when a more recent pre-load instruction is received. By cancelling the earlier pre-load instruction, the number of pre-load instructions which need to be issued to the AXI bus 50 is reduced. Reducing the number of pre-load instructions to be sent to the AXI bus 50 is advantageous since this reduces the load on an already busy AXI bus 50. This then frees the AXI bus 50 to perform more immediately critical transactions which may be required by the processor core. The pre-load instructions may readily be cancelled since these instructions are essentially speculative and the resultant data may not have been used anyway.
At step S10, the unit receives an instruction or request.
At step S20, the availability of the AXI bus 50 is reviewed.
At step S30, in the event that AXI bus 50 is available, the instruction or request is transmitted over the AXI bus 50 at step S35 and processing returns to step S10. However, in the event that the AXI bus 50 is unavailable then processing proceeds to step S40.
At step S40, a determination is made whether it is possible to optimise the received instruction or request with any pending instruction or requests. In the event that no optimization is possible then processing returns to step S10. However, in the event that it is determined that optimization is possible then processing proceeds to step S50. At step S50, pending requests are optimized. Thereafter, at step S60, those optimizations are stored and processing then returns to step S10.
In this way, it can be seen that the units determine whether a component of the data processing apparatus, such as the AXI bus 50, is unable to currently support the processing activity and, if so, reviews the pending requests to see whether they can be altered in some way to assist the subsequent data processing activities. Accordingly, the time available whilst waiting for unit to become available can be utilised to analyse the pending requests and to optimize or alter these requests in some way in order to subsequently improve the performance of the data processing apparatus. Hence, once the component is then able to deal with the altered requests, the altered requests will then enable the data processing apparatus to operate more efficiently than had the original requests been used.
Although a particular embodiment of the invention has been described herein, it will be apparent that the invention is not limited thereto, and that many modifications and additions may be made within the scope of the invention. For example, various combinations of features of the following depending claims could be made with features of the independent claims without departing from the scope of present invention.