This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2009-032534, filed on Feb. 16, 2009; the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to a microprocessor and a memory-access control method.
2. Description of the Related Art
A microprocessor includes a memory (an instruction memory) in which instructions are stored, an instruction fetch unit that fetches (reads out) an instruction to be executed from the instruction memory, a processing unit that accesses a memory in which data is stored and performs arithmetic operation according to the instruction read out by the instruction fetch unit, and a data memory. The microprocessor can simultaneously perform processing for a plurality of data according to one instruction.
In some instruction executed by the processing unit, the width (the number of bits) of data used in processing indicated by the instruction (data loaded from the data memory) and the memory width of the data memory are not aligned. Therefore, a microprocessor in the past adopts, to prevent an increase in latency and a fall in throughput in executing such an instruction, a configuration in which a memory instance is divided to increase the number of banks. A method of simultaneously accessing all banks in which data designated by an instruction is present is used in the microprocessor.
However, in the method, an area overhead also increases according to the increase in the number of banks.
Power consumption also increases according to the increase in the number of banks simultaneously accessed.
Japanese Patent Application Laid-Open No. 2004-38544 discloses, as an example of the microprocessor in the past, an image processing apparatus in which a fall in performance is suppressed. Japanese Patent Application Laid-Open No. 2002-358288 discloses, as another example of the microprocessor in the past, a semiconductor integrated circuit that efficiently performs single instruction multiple data (SIMD) operation. However, the technologies disclosed in these patent documents do not take into account the problems due to the increase in the number of banks of the data memory.
A microprocessor according to an embodiment of the present invention comprises: a load store unit that loads, when a fetched instruction is a load instruction for data, a data sequence including designated data from a data memory in memory width unit and specifies, based on an analysis result of the instruction, data scheduled to be designated in a load instruction in future in the loaded data sequence; and a data temporary storage unit that stores use-scheduled data as the data specified by the load store unit.
A memory-access control method according to an embodiment of the present invention comprises: loading, when a load instruction for data is fetched, a data sequence including designated data from the data memory in memory width unit; specifying, based on an analysis result of the load instruction, data scheduled to be designated in a load instruction in future in the loaded data sequence; and writing the data specified in the specifying in a data temporary storage unit as use-scheduled data.
Exemplary embodiments of a microprocessor and a memory-access control method. according to the present invention will be explained below in detail with reference to the accompanying drawings. The present invention is not limited to the following embodiments.
First, types of instructions executed by processors according to the embodiments and an example of operation performed when a processor in the past executes the same instructions are explained.
In the example shown in
In the operation shown in
A processor according to a first embodiment of the present invention is explained below. In examples explained in the first embodiment and a second embodiment, processors are SIMD processors. However, the configuration of the processors does not have to be the SIMD type.
The instruction memory 1 is a memory that stores an instruction for controlling the processing unit 4. The instruction fetch unit 2 includes a program counter (pc) 3 that outputs a value indicating a number of an instruction to be executed. The instruction fetch unit 2 extracts an instruction to be executed from the instruction memory 1 according to an output value of the program counter 3.
The processing unit 4 includes an instruction decoder (dec) 5, a plurality of arithmetic elements (p) 6 to 13, and a load store unit (lsu) 14. The processing unit 4 executes various kinds of processing according to the instruction extracted from the instruction memory 1 by the instruction fetch unit 2. Specifically, the processing unit 4 receives the instruction extracted by the instruction fetch unit 2. The instruction decoder 5 decodes the instruction. The load store unit 14 exchanges data with the data memory 16 according to the decoded instruction. The arithmetic elements 6 to 13 execute various kinds of arithmetic operation. The load store unit 14 reads out (loads) data from and writes (stores) data in the data memory 16 in memory width unit. When loaded data includes data scheduled to be designated in the next load instruction as well, the load store unit 14 stores the data in the data temporary storage unit 17. In addition, when data used in processing to be executed by the arithmetic elements next (use-scheduled data) is stored in the data temporary storage unit 17, the load store unit 14 acquires the use-scheduled data.
Formats of various instructions used in the control by the processor according to this embodiment are not specifically limited. However, it is assumed that the load instruction received from the instruction fetch unit 2 includes information concerning whether the data loaded from the data memory 16 is scheduled to be designated in the next load instruction as well.
In repeated execution (n=0, 1, 2, . . . ) of an instruction sequence (m=0, 1, 2, . . . ), when execution inst-m(n) of a certain load instruction m in the repetition n of the instruction sequence is the present load instruction, execution inst-m(n+1) of the load instruction m in repetition n+1 of the instruction sequence is the next load instruction.
The data memory 16 includes two bank areas (a bank #0 and a bank #1). The processing unit 4 can simultaneously refer to the two banks.
The data temporary storage unit 17 includes a control circuit (ctrl) 18, an address generating unit (addr) 19, and a memory (static random access memory (SRAM)) 20 including two banks (a bank A and a bank B). When the data temporary storage unit 17 receives data (D1) scheduled to be used in future from the processing unit 4, the data temporary storage unit 17 stores the data (D1). When the data temporary storage unit 17 receives a readout request for the stored data, the data temporary storage unit 17 outputs the data.
The control circuit (a control unit) 18 reads out data from and writes data in the memory 20 according to control signals S2 and S3 input from the load store unit 14. The address generating unit 19 generates, based on an output value (Si) of the program counter 3, an address for accessing the memory 20. The memory 20 stores, in one of the bank areas, data received from the processing unit 4.
The processor according to this embodiment having the configuration explained above has a function of proceeding with processing in data array unit (equivalent to SD(0), SD(1), . . . , SD(n) shown in
Therefore, in the processor according to this embodiment, when data referred to in inst-m(n+1) as well is present in data read out in inst-m(n), i.e., when the data width designated by the load instruction and the memory width of the data memory are not aligned, the data referred to in inst-m(n+1) as well is stored in the data temporary storage unit 17. For example, in the case of the example shown in
An upper limit of the number of data stored in the data temporary storage unit 17 depends on deviation width from the memory alignment allowed by the processor. Specifically, the banks of the memory (SRAM) 20 of the data temporary storage unit 17 can be limited to bit width enough for storing the number of data equivalent to the deviation width. For example, in the case of the processor that controls only accesses shown in
It is possible to reduce the number of words of the banks (the banks A and B) of the memory 20 by limiting the number of words to the number of instructions that can refer to the data of SD(n−1). For example, when maximum deviation width from the memory alignment that can be designated by the load instruction is 16 bits (16-bit data×1) and an upper limit of the number of issuable load instructions deviating from the memory alignment is thirty-two, the banks A and B only have to have a 16 bit×16 word configuration (a total number of words of the banks A and B is thirty-two). This makes it possible to hold down a memory capacity.
The data temporary storage unit 17 having the configuration explained above stores, according to PC (Si) as an output signal (a program counter value) from the program counter 3 of the instruction fetch unit 2, MemLdReq (S2) as an output signal from the load store unit 14 of the processing unit 4, and LeftAccess (S3), data received from the load store unit 14 through WData (D1) in the memory 20. The data temporary storage unit 17 outputs the data stored in the memory 20 to the load store unit 14 through RData (D2). The MemLdReq signal (S2) is a signal for requesting output (load) of the data stored by the data temporary storage unit 17. The LeftAccess signal (S3) is a signal indicating that an access deviates from the memory alignment. As explained in detail later, the data temporary storage unit 17 simultaneously performs operation for writing data in one bank of the memory 20 and operation for reading out data from the other bank to thereby prevent a fall in processing speed of the entire processor.
Detailed operation of the data temporary storage unit 17 is explained below together with operations of other sections related thereto in the processor.
When an instruction extracted from the instruction memory 1 by the instruction fetch unit 2 is a load instruction for data and indicates a memory access deviating from the memory alignment, the load store unit 14 asserts (activates) the MemLdReq signal S2 and the LeftAccess signal S3 for access to the data temporary storage unit 17.
When the data temporary storage unit 17 detects that the MemLdReq signal S2 is asserted, the data temporary storage unit 17 performs readout operation from the memory 20. This cycle is referred to as LO below.
Specifically, first, the control circuit 18 calculates AND of the MemLdReq signal S2 and the LeftAccess signal S3 to generate a signal (PBuffReadReq) indicating the readout operation from the memory 20. To perform write operation explained below continuously from the readout operation, the control circuit 18 writes PBuffReadReq in a register as rPBuffReq.
The address generating unit 19 generates, based on an input program counter value (hereinafter, “PC value”), an address signal (ReadAddress) indicating an access destination of the memory 20 and a bank selection signal (ReadBankSel). More specifically, the address generating unit 19 outputs a least significant bit of the PC value as the bank selection signal and outputs the remaining bits as the address signal. Consequently, because banks to be used are reversed according to load instructions having continuous PC values, it is possible to continuously perform update operation explained later. ReadBankSel and ReadAddress are written in the register as rBankSel and rAddress to be referred to in the next cycle (L1).
When PBuffReadReq is asserted, the control circuit 18 selects a bank according to ReadBankSel. Specifically, when ReadBankSel is 0, the control circuit 18 enables a bank-A readout request signal (ReadBankA) and, when
ReadBankSel is 1, the control circuit 18 enables a bank-B readout request signal (ReadBankB).
In the control circuit 18, a readout request (ReadBankA) and a readout address (ReadAddress) are input to a bank-A control circuit. The bank-A control circuit enables a bank-A access request (Req(A)) unless the input readout request (ReadBankA) and a write request explained later conflict with each other. Similarly, a readout request (ReadBankB) and a readout address (ReadAddress) are input to the bank-B control circuit. The bank-B control circuit enables a bank-B access request (Req(B)) unless the input readout request (ReadBankB) and a write request explained later conflict with each other.
The control circuit 18 selects, according to rBankSel, one of data output from the bank A and the bank B of the memory 20 and outputs the selected data to the load store unit 14 as the readout data RData (D2) of the data temporary storage unit 17.
The load store unit 14 receives the data output from the data temporary storage unit 17. As shown in the upper section of
Specifically, a bank and an address indicating the area to be updated are the same as those during the readout. Therefore, in the update operation, the control circuit 18 reads out rBankSel and rAddress from the resisters in which values used in the cycle LO from are stored and sets the values as a bank selection signal WriteBankSel and an address WriteAddress for update.
The control circuit 18 reads out a value from the register that stores rPBuffReq representing that the readout operation is performed in the cycle L0 and sets the value as a write request signal PBuffWriteReq. When PBuffWriteReq is asserted, the control circuit 18 selects a bank according to WriteBankSel. Specifically, when WriteBankSel is 0, the control circuit 18 enables a bank-A write request signal (WriteBankA) and, when WriteBankSel is 1, the control circuit 18 enables a bank-B write request signal (WriteBankB).
In the control circuit 18, the write request (WriteBankA) and the write address (WriteAddress) are input to the bank-A control circuit. The bank-A control circuit enables the bank-A access request (Req(A)) unless the input writ request (WriteBankA) and the readout request (ReadBankA) conflict with each other. Similarly, the write request (WriteBankB) and the write address (WriteAddress) are input to the bank-B control circuit. The bank-B control circuit enables the bank-B access request (Req(B)) unless the input write request (WriteBankB) and the readout request (ReadBankB) conflict with each other.
The control circuit 18 gives the memory 20 the access request (Req(A) or Req(B)) and write data WData (D2) received from the load store unit 14 to update the data. WData (D2) is obtained by selecting data of a section referred to during execution of the next instruction (inst-m(n+1)) (in the operation example shown in
In the data temporary storage unit 17 shown in
In the above explanation, the data readout operation and the data write operation for one bank of the memory 20 are explained. However, the processor applies opposite operation to the other bank in parallel to the data readout operation or the data write operation (when the data readout operation is applied to one bank, the data write operation is applied to the other bank) to thereby prevent a fall in processing speed of the processor as a whole (see
As explained above, in executing a load instruction in which the width of reference data (processing target data) and the memory width of the data memory are not aligned, when data referred to in a load instruction to be executed next time (data scheduled to be designated in the load instruction to be executed next time) is included in a data sequence to be loaded, the processor according to this embodiment stores the data in the data temporary storage unit. The processor reads out the stored data from the data temporary storage unit during execution of the next load instruction. The processor reads out, from the data memory, the remaining processing target data other than the data read out from the data temporary storage unit (data not stored in the data temporary storage unit among the data designated by the load instruction). The processor executes, in parallel, processing for reading out data from one bank in the memory and processing for writing data in the other bank. This makes it possible to reduce, compared with the past, the number of banks in the data memory provided to prevent an increase in latency and a fall in throughput in executing an instruction in which the width of reference data and the memory width are not aligned. As a result, it is possible to realize a processor that holds down an area overhead and power consumption while maintaining processing performance.
In the technology disclosed in Japanese Patent Application Laid-Open No. 2004-38544, in some case, data transfer time from an input line buffer to an SIMD processor increases. Specifically, when data transfer speed is A bit/cycle and the bit width (the number of bits) of data used in SIMD processing is B, transfer time is B/A cycles. For example, when A is 16 and B is 128, the transfer time is 8 cycles. Therefore, waiting time from the storage of data in the input line buffer until the start of SIMD operation occurs. In the technology disclosed in Japanese Patent Application Laid-Open No. 2002-358288, the use of a data buffer of a dual port is a premise. However, in the SIMD processor according to this embodiment, the waiting time until the start of arithmetic operation (waiting time equal to or longer than two cycles) does not occur and the use of a data buffer of a dual port is not a premise.
In the processor according to the first embodiment, the address generating unit 19 of the data temporary storage unit 17 uses a least significant bit of a program counter value (PC value) as a bank select signal and uses the remaining bits as an address signal (see
As shown in
When the address generating unit 19a explained above is adopted, it is possible to realize a processor that can obtain effects same as those of the processor according to the first embodiment.
Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2009-032534 | Feb 2009 | JP | national |