1. Field of the Invention
The present invention relates to processors such as a microprocessor and a DSP (Digital Signal Processor), and more particularly, to a data load technique reading out unaligned data block from a data memory to a register file included in the processor.
2. Description of Related Art
Processors such as a microprocessor and a DSP (Digital Signal Processor) are adapted to handle data by setting a predetermined data length to unit. Many processors which have currently been used set the unit to 32 bits (4 bytes) or 64 bits (8 bytes). This unit is called “word”. When the data unit of the processor is set to 64-bit unit, 32-bit unit may often be called “word” and 64-bit unit “doubleword” according to customary practice. A register length of registers provided in the processor is in size capable of storing data of one word or an integral multiple thereof.
The data unit of a peripheral device such as a data memory connected to the processor is defined based on the data unit of the processor as well. Accordingly, the data processing speed between the processor and the peripheral device can be increased. For example, a line width of a cache memory connected to the processor is defined as one word or the integral multiple thereof in accordance with the data unit of the processor. Accordingly, the processor can effectively load the data of one word or the integral multiple thereof into the register in the processor by one cache access.
When data of one word unit is stored in the data memory immediately after data less than one word is stored, the data may be stored with crossing a boundary of one word unit (word boundary) or a line boundary of the data memory (also called cache line boundary). The term “unaligned data” in the specification means one word data stored with crossing the word boundary. The term “unaligned data block” in the specification means the unaligned data having a data length twice or more larger than a register length of the processor, which is the data length of two or more words, and having a data boundary not corresponding to the word boundary of the data memory.
In order to align and load unaligned data into the register in the processor, a MIPS instruction set, which is a representative instruction set, includes an LWL (Load Word Left) instruction, an LWR (Load Word Right) instruction, an LDL (Load Double-word Left) instruction, and an LDR (Load Double-word Right) instruction, for example. By executing these instructions by combining them, the load of the unaligned data can be executed by two memory accesses. Hereinafter the LWL instruction, the LWR instruction, the LDL instruction, and the LDR instruction are collectively called “unaligned load instruction”. The detailed description of the unaligned load instruction defined by the MIPS instruction set is described in pages 205 to 209 and 222 to 228 of the document dated Jul. 1, 2005 by MIPS Technologies Inc., entitled “MIPS64 (R) Architecture For Programmers Volume II: The MIPS64 (R) Instruction Set”.
As an example, the load processing of the unaligned data employing the LDL instruction and the LDR instruction will be described with reference to
The 64-bit processor employing the MIPS instruction set can load X3, X2, and X1 from the line of 0000h by execution of the LDR instruction to store them in the register R8 in right alignment. Further, the 64-bit processor can load X4 from the line of 0004h by execution of the LDL instruction to store the X4 in the register R8 in left alignment.
As stated above, when the unaligned load instruction including the LDL instruction and the LDR instruction is used, two instructions in total need to be executed in order to load one unaligned data (X1 to X4, for example) whose data length is equal to a word unit into the processor. Therefore, as shown in
As stated above, we now faces the problem that a number of instructions need to be executed in order to load the unaligned data block in the register file in the processor. Due to this problem, the execution time of the digital filter processing may be increased when this processing including a lot of processings employing the unaligned data block is executed with the processor.
According to a first aspect of the present invention, there is provided a processor including an instruction decoder, an instruction execution part and a register file. The instruction decoder is adapted to decode an instruction. The instruction execution part is adapted to execute processing corresponding to the instruction decoded by the instruction decoder. The register file is capable of storing load data from a data memory and supplying input data to the instruction execution part. The register file includes a plurality of registers, each of which is capable of holding a plurality of bits of data. Furthermore, the register file is configured to update the data held by the plurality of registers by shifting the data held by the plurality of registers among the plurality of registers.
As described above, according to the processor of the first aspect of the present invention, the data held in the plurality of registers in the register file can be shifted among the plurality of registers. According to the processor thus configured, the unaligned data block stored in the data memory can be loaded into the register file by a simple procedure exemplary described below.
For example, the processor repeatedly executes an instruction (hereinafter this instruction is called aligned load instruction) for loading data (hereinafter this data is called aligned data) aligned according to a word boundary of a data memory to forward a plurality of aligned data in a range including the unaligned data block from the data memory to the register file. Then the processor executes a shift instruction for performing a data shift operation of the register file to shift held data among the registers holding the plurality of aligned data. Accordingly, the processor is able to store the unaligned data block with being aligned in the plurality of registers.
According to the above proceedings, the unaligned data block of N-word length can be loaded into the register file by the execution of N+1 aligned load instructions and one shift instruction. In other words, according to the processor of the first aspect of the present invention, it is possible to execute the aligned load processing of the unaligned data block with fewer instructions than in the proceedings in which the unaligned load instruction needs to be executed 2N times as shown in the related art.
The above and other objects, advantages and features of the present invention will be more apparent from the following description of certain preferred embodiments taken in conjunction with the accompanying drawings, in which:
The invention will now be described herein with reference to illustrative embodiments. Those skilled in the art will recognize that many alternative embodiments can be accomplished using the teachings of the present invention and that the invention is not limited to the embodiments illustrated for explanatory purposes.
The specific embodiment to which the present invention is applied will now be described in detail with reference to the drawings. The same components are denoted by the same reference symbols in the drawings, and the overlapping description thereof will be omitted for the sake of clarity.
The register file 13 is a set of a plurality of registers. In the present embodiment, the register file 13 is regarded as including 32 registers R0 to R31. Each register length of the registers R0 to R31 is 64 bits. It is noted that the register number and the register length included in the register file 13 is only an example. The registers R0 to R31 can be variously employed such as an accumulator storing input data and output data of the instruction execution part 14, or an address register performing an address assignment in accessing a data memory 51. The registers R0 to R31 store data loaded from the data memory 51 into the processor 1 for a processing.
Further, the register file 13 is able to shift the held data among a plurality of registers selected from the registers R0 to R31. The configuration example of the register file 13 allowing the data shift among the registers will be described later.
The instruction execution part 14 executes processing in accordance with the instruction decoded in the instruction decoder 11. To be more specific, the instruction execution part 14 includes a plurality of execution units, and executes the decoded instruction in the execution unit suitable for the instruction in accordance with the control made by the controller 12. For example, when the instruction designating the execution of the processing such as an Add instruction, MAC (Multiply and Accumulation) instruction is decoded, the instruction execution part 14 executes the designated processing using the data supplied from the register file 13. Further, when the load instruction or the store instruction is decoded, the instruction execution part 14 generates a destination address of the data memory 51 to access the data memory 51. The specific example of the execution unit included in the instruction execution part 14 includes a floating-point arithmetic unit, an integer arithmetic unit, and a load/store unit. Alternatively, the instruction execution part 14 may include a dedicated execution unit which is specialized in a specific processing (digital filter operation, for example).
Although
Hereinafter, a configuration example and a specific operation of the register file 13 will be described with reference to
WR1DATA[63:0] is 64-bit data input from the instruction execution part 14 to the register file 13. WR2DATA[63:0] is 64-bit data input from the data memory 51 to the register file 13. WR1WA[4:0] and WR2WA[4:0] are write addresses of the register file 13. WR1WBRQ and WR2WBRQ are 1-bit logic signals indicating presence or absence of write back request to the register file 13.
RD1[63:0] to RD3[63:0] are data read out from the registers R0 to R31. RA1[4:0] to RA3[4:0] are load addresses of the register file 13. Although the register file 13 is regarded as being capable of simultaneously supplying three data to the instruction execution part 14 in
SFTRQ is a 1-bit logic signal indicating presence or absence of execution request of the shift operation to the register file 13. SFTTRG[31:0] is a signal designating the register which is the target of the shift operation of the registers R0 to R31. SFTDIR is a 1-bit signal designating a direction of the data shift. Then SFTVAL[1:0] is a signal designating a data shift amount.
A write command generator 130 receives WR1WBRQ or WR2WBRQ, which is a write back request to the register file 13, and write address WR1WA[4:0] or WR2WA[4:0]. Then, the write command generator 130 outputs the WR1TRG signal to the register corresponding to the write address WR1WA[4:0] when WR1WBRQ is 1. The write command generator 130 outputs the WR2TRG signal to the register corresponding to the write address WR2WA[4:0] when WR2WBRQ is 1. The WR1TRG signal and the WR2TRG signal are trigger signals indicating fetching of the WR1DATA[63:0] or WR2DATA[63:0] to the registers R0 to R31.
The load data selector 131 receives the load address RA1[4:0]. Then the load data selector 131 selects the register corresponding to the RA1[4:0] from among the registers R0 to R31 and outputs the stored value of the selected register as the load data RD1[63:0]. Similarly, the load data selector 131 receives the load addresses RA2[4:0] and RA3[4:0], and outputs the stored values of the registers corresponding to the addresses as RD2[63:0] and RD3[63:0], respectively.
An AND circuit 132 calculates logical AND between 1-bit signal SFTRQ and each bit of 32-bit signal SFTTRG[31:0], and outputs the calculation result as 32-bit data. In the configuration example of
Each of the registers R0 to R31 can hold data of 64-bit length. The registers R0 to R31 can selectively connect the adjacent registers and can perform the data shift operation between the connected registers. In
The WDO[63:0] output terminal outputs 64-bit data held in the register element. The LDATA[63:0] terminal receives 64-bit data held in the lower-side register. Further, The UDATA[63:0] terminal receives 64-bit data held in the upper-side register. For example, the LDATA[63:0] terminal of the register R1 (RE_#1) receives 64-bit data held in the register R0. The UDATA[63:0] terminal of the register R1 (RE_#1) receives 64-bit data held in the register R2.
In the configuration of
A shift circuit 41 receives 64-bit data held in the register 40, 64-bit data (LDATA[63:0]) held in the lower-side register element, and 64-bit data (UDATA[63:0]) held in the upper-side register element. Then the shift circuit 41 executes the shift operation of 192-bit data in which these data are connected together. The data shift direction and the data shift amount in the shift operation performed in the shift circuit 41 is determined in accordance with the SFTDIR signal and SFTVAL[1:0] input to the shift circuit 41.
A selector 42 receives WR1DATA[63:0] and WR2DATA[63:0]. Then the selector 42 selects and outputs WR1DATA[63:0] when the WR1TRG supplied from the write command generator 130 is “1”, and selects and outputs WR2DATA[63:0] when the WR1TRG is “0”.
A selector 43 receives the output data of the shift circuit 41 and the output data of the selector 42. Then the selector 43 selects and outputs data supplied from the shift circuit 41 when the SFTTRGX supplied from the AND circuit 132 is “1”, and selects and outputs data supplied from the selector 42 when the SFTTRGX is “0”.
A selector 44 receives the data held in the register 40 and the output data of the selector 43. Then the selector 44 selects and outputs the data held in the register 40 when 1-bit logic signal supplied from an OR circuit 45 is “0”. As shown in
The OR circuit 45 calculates logical OR among the WR1TRG, the WR2TRG and the SFTTRGX and supplies the calculation result to the control terminal (not shown) of the selector 44. Note that the WR1TRG and WR2TRG are the trigger signals indicating execution of the write operation into the register 40, and the SFTTRGX is the trigger signal indicating execution of the data shift operation.
Now, the specific example of the data shift operation of the register file 13 will be described.
The right shift instruction denoted by mnemonic “VREGSHR.H R0, R3” shown in
On the other hand,
As stated above, the processor 1 can selectively perform the data shift among the registers R0 to R31 included in the register file 13 where the data loaded from the data memory 51 is stored. A procedure for effectively performing the load processing of the unaligned data block in the processor 1 will be described hereinafter in detail.
The specific example of the load processing of the unaligned data block will be described in detail with reference to
A left upper part of
According to the data load method in the processor 1 of the present embodiment described with reference to
It is apparent that the present invention is not limited to the above embodiments, but may be modified and changed without departing from the scope and spirit of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2007-200606 | Aug 2007 | JP | national |