CACHE MEMORY HAVING PIPELINE STRUCTURE AND METHOD FOR CONTROLLING THE SAME

Information

  • Patent Application
  • 20080098174
  • Publication Number
    20080098174
  • Date Filed
    October 24, 2007
    17 years ago
  • Date Published
    April 24, 2008
    16 years ago
Abstract
One embodiment of the present invention is a cache memory arranged between a processor and a low-speed memory and performing a pipeline processing of a memory access made by the processor. In a first stage, the cache memory reads out a tag address from a tag memory. In a second stage, the cache memory performs a hit decision by a hit decision unit. When the hit decision result is a miss hit, the cache memory performs an update control of the tag memory and a behavior control of a bypass circuit for supplying a data held in a latch circuit to the hit decision unit by bypassing the tag memory in a third stage. The latch circuit is configured to hold a tag address included in a input address supplied from the processor. When the hit decision result is the miss hit, the cache memory further performs an update of the data memory by reading out of the data from the low-speed memory and an outputting of the data read out from the low-speed memory to the processor in a fourth stage just after the third stage or in the later stage.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objects, advantages and features of the present invention will be more apparent from the following description of certain preferred embodiments taken in conjunction with the accompanying drawings, in which:



FIG. 1 is a configuration diagram of a cache memory according to an embodiment of the present invention;



FIGS. 2 to 4 are diagrams showing a pipeline process made by the cache memory according to the embodiment of the present invention;



FIG. 5 is a configuration diagram of a cache memory of a Related Art;



FIG. 6 is a diagram showing a cache memory pipeline process of the Related Art; and



FIG. 7 is a diagram showing an example of an input address.





DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The invention will be now described herein with reference to illustrative embodiments. Those skilled in the art will recognize that many alternative embodiments can be accomplished using the teachings of the present invention and that the invention is not limited to the embodiments illustrated for explanatory purposes.


A specific embodiment to which the present invention applies will now be described in detail below with reference to the drawings. In each drawing, the same reference numerals are used for the same components. The overlapping description is appropriately omitted for the sake of clarity.


A configuration of a cache memory 1 according to the present embodiment is shown in FIG. 1. The cache memory 1 is a four-way set associative type cache memory. We assume that the cache memory here is the four-way set associative configuration so that the cache memory 1 and a cache memory 8 of a Related Art shown in FIG. 5 are easily compared. However, such a configuration is merely one example. A number of ways of the cache memory 1 may be other than four or the cache memory 1 may be a direct-map type cache memory.


The components of a data memory 10, a tag memory 11, a hit decision unit 12, and an address latch 14, all of which are included in the cache memory 1, is the same as the components shown in FIG. 5. Therefore, the same reference numerals are given to the corresponding components and detailed description will be omitted here.


A behavior of a controller 13 included in the cache memory 1 is the same as a behavior of a controller 83 of the Related Art when a hit decision result is a cache hit. More specifically, the controller 13 controls reading out of data from the data memory 10 by outputting a chip select signal (CS signal) and a read strobe signal (RS signal) to the data memory 10 when it is decided by the hit decision unit 12 that the result is the cache hit. On the other hand, when it is decided by the hit decision unit 12 that the result is the miss hit, the controller 13 controls rewriting of the tag memory 11 in order to store the tag address included in the input address in the tag memory 11, data refilling of the data memory 10, and a behavior of a selector 19 described below.


The cache memory 1 has more latch circuits for holding an intermediate data between the pipeline stages than the cache memory 8 shown in FIG. 5 has because the cache memory 1 adopts the four-stage pipeline structure. Address latches 15 to 17, 20 and a data latch 21 correspond to the larch circuits. The address latch 15 is a circuit for holding at least an index address part and a word address part of the input address. The address latches 16 and 17 as well as the address latch 14 are circuits for holding at least a tag address part of the input address.


The address latch 20 is a circuit for holding the intermediate data between the selector 19 and the hit decision unit 12. The address latch 20 is configured to be able to hold four tag addresses output from the tag memory 11 in response to the input address. For example, the address latch 20 may have four D flip flop circuits, each of which can hold one tag address.


The data latch 21 is a circuit for holding a data output from the data memory 10. In other words, the data latch 21 is arranged to divide a process of accessing the data memory 10 and a process of transferring the data to the processor 2 in separate pipeline stages.


A bypass line 18 and the selector 19 comprise a bypass circuit for inputting the data held in the address latch 17 to the hit decision unit 12 by bypassing the tag memory 11. An operation of the selector 19 is controlled by a control signal (SC signal) output from the controller 13.


Referring now to FIG. 2, a behavior of the cache memory 1 will be described. FIG. 2 shows a pipeline behavior of the cache memory 1 when a load request made by the processor 2 is processed. Part (a) of FIG. 2 shows the behavior when the hit decision result is the cache hit and part (b) of FIG. 2 shows the behavior when the hit decision result is the miss hit. In a first stage of the pipeline, the tag memory 11 receives the input address supplied from the processor 2 and output four tag addresses corresponding to the index address of the input address. The four tag addresses output from the tag memory 11 are held in the address latch 20 through the selector 19.


Next, in a second stage just after the first stage, the hit decision is made by the hit decision unit 12. The hit decision unit 12 compares the tag address included in the input address held in the address latch 16 with the tag address held in the address latch 20.


When the decision made by the hit decision unit 12 is the cache hit, the input address, the CS signal, and the RS signal are input to the data memory 10 at a last part of the second stage. Then as shown in the part (a) of FIG. 2, in a third stage just after the second stage, the data is read out from the data memory 10 and the data which is read out is held in the data latch 21. Lastly, in a fourth stage just after the third stage, the data held in the data latch 21 are transferred to the processor 2 and are stored in a storage area of the processor 2 such as a general register.


On the other hand, when the decision made by the hit decision unit 12 is the miss hit, the outputs of the CS signal and the RS signal at the last part of the second stage are not performed. Then as shown in the part (b) of FIG. 2, in the third stage just after the second stage, the controller 13 performs a process of deciding a replacement way and an update process of the tag address decided as the replacement way held in the tag memory 11 with the tag address included in the input address. The decision of the replacement way can be performed by using decision methods such as a random method for selecting the way at random from the four ways or an LRU (Least Recently Used) method for selecting the way that has not been referred to for the longest period of time.


Moreover, the controller 13 controls the selector 19 in the third stage and updates the address latch 20 which holds the tag address corresponding to the replacement way with a storage value of the address latch 17, in other words the tag address of the input address.


The controller 13 performs the process of deciding the replacement way and the update process of the tag memory 11, and performs controlling of the selector 19 as described above in the first clock cycle of the third stage, which means in the C3 cycle shown in the part (b) of FIG. 2. The controller 13 also stalls the pipeline behavior by one clock cycle by outputting a WAIT signal.


In the fourth stage just after the second cycle of the third stage in which the pipeline behavior was stalled, a read access is performed to the main memory 3 connected to the memory bus 6. Then the data corresponding to the input address is read out from the main memory 3 and is stored in the data memory 10. Also in the same fourth stage, the data read out from the main memory 3 is output to the processor 2.


Referring now to FIGS. 3 and 4, an effect of the cache memory 1 working as above will be described. FIG. 3 is a timing chart showing the pipeline processing of the cache memory 1 when two load requests (load requests A and B) are successively received. More specifically, FIG. 3 shows the process when the miss hit occurs in the preceding load request A.


As shown in FIG. 3, when the decision result in the second stage of the load request A (m+1 stage) is the miss hit, the replacement way is decided and the tag memory 11 is rewritten in the subsequent first cycle (C3 cycle) of the third stage (m+2 stage). And the tag address that is to be stored in the tag memory 11 is supplied to the address latch 20 by bypassing the tag memory 11 by the bypass line 18 and the selector 19. Then the pipeline is stalled by one cycle.


Note that the process in response to the subsequent load request B has begun in parallel with the process in response to the above-described load request A. Specifically, in the m+1 stage which is the second stage of the load request A, the tag address is read out from the tag memory 11 as the process in the first stage of the load request B. In other words, when the tag address in the load request B is read out, the update of the tag memory 11 in response to the miss hit of the preceding load request A has not completed. In the second stage of the load request B (m+2 stage), the hit decision is made about the load request B. This hit decision is however performed without reflecting the update result of the tag memory 11 in response to the miss hit of the preceding load request A.


However, the hit decision of the load request B which is performed again in the second cycle of the m+2 stage (C4 cycle) in which the pipeline behavior is stalled is made using the new tag address given to the address latch 20 by bypassing the tag memory 11.


As stated above, according to the cache memory 1 of the present embodiment, it is possible to reflect the update result of the tag memory 11 due to the occurrence of the miss hit in a preceding memory access request on the hit decision in a subsequent memory access request even when the miss hit occurs in the preceding memory access request. Therefore, it is possible to prevent an incorrect decision when the hit decision is made in response to the subsequent memory access request, to suppress an unwanted data refill behavior, and to avoid outputting the incorrect data from the data memory 10.


Moreover, as shown in FIG. 3, in the cache memory 1, even when the miss hit occurs in the preceding memory access request, there is no need to retry the process of the subsequent memory access request again from the process of reading out of the tag memory 11. Therefore, a redundant hardware is not needed for performing the process again from the process of reading out of the tag memory 11. In addition, it is possible to prevent cache access time of the subsequent memory access request from being increased because there is no need to retry the process of reading out of the tag memory 11.


Moreover, the cache memory 1 is effective in a point below. FIG. 4 is a timing chart showing a case in which a direct store request and the load request are successively performed. In the direct store request, the processor 2 writes the data directly to the main memory 3 without involving the cache memory 1. The load request is made for the cache memory 1, as will be clear from the above description. The pipeline of the direct store request shown in the part (b) of FIG. 4 is the pipeline of the processor 2 and the pipeline of the load request shown in the part (c) of FIG. 4 is the pipeline of the cache memory 1.


The direct store request shown in the part (b) of FIG. 4 is performed in a six-stage pipeline from an IF stage (C1 cycle) to an EXE stage (C6 cycle) in the processor 2. More specifically, in the IF stage (C1 cycle), an instruction is taken from an instruction cache. In an ID stage (C2 cycle), the fetched instruction is decoded. In an ADR stage (C3 cycle), a calculation of an effective address is performed. In the EXE stage (C6 cycle), the data is written into the main memory 3. When the data is written into the main memory 3, it is more common to write the data first into a store buffer (not shown) than to write the data directly to the main memory 3 for the purpose of preventing the processor 2 being stalled due to a speed difference between a bus speed of the memory bus 6 and a processing speed of the processor 2. When the data is written first to the store buffer, only the outputting of the data to the store buffer is performed in the EXE stage (C6 cycle). The data is written to the main memory 3 in C7 cycle or in a later cycle which comes after the EXE stage.


As stated above, when an actual access to the main memory 3 is made by the direct store request in the C7 cycle or in the later cycle, the data is still in the middle of being written into the store buffer in the C6 cycle and the access request made by the store buffer has not output to the memory bus 6 yet. Therefore, if the access request by the cache memory 1 in which the miss hit is detected is made first in the C6 cycle in FIG. 4, the data is read out first from the main memory 3 due to the miss hit because there is no competing access request and the access to the memory bus 6 by the previous store instruction (direct store access) is performed later. If both the direct store request and the load request are made for the same address, a program cannot normally be performed any more because the process order has been interchanged.


However, in the cache memory 1 according to the present invention, the data is read out from the main memory 3 in the last stage of the pipeline when the miss hit occurs. Therefore, in the timing chart shown in the part (c) of FIG. 4, the cache memory 1 does not access the memory bus 6 before a C8 cycle, which means the access to the memory bus 6 by the store buffer has begun before the access is made by the cache memory 1. Note that, in the part (c) of FIG. 4, a CMP stage in a C5 cycle is the pipeline stage in which the hit decision is made. A WTG stage in a C6 cycle is the pipeline stage in which the tag memory 11 is updated due to the miss hit. An MAC stage in the C8 cycle or in the later cycle is the pipeline stage in which the data is read out from the main memory 3 and the data which is read out is transferred to the processor 2. As shown in FIG. 4, if the access request to the memory bus 6 made by the store buffer (not shown) and the access request to the memory bus 6 made by the cache memory 1 are concurrently occurred, or if the access request made by the cache memory 1 occurs later than the access request made by the store buffer, a bus controller (not shown) of the memory bus 6 can control the access request according to an order for performing the instruction. Therefore, even when both the direct store request and the load request are made for the same address, the program can normally be performed.


Note that the configuration of the cache memory 1 is merely one example, and various changes can be made. For example, the number of pipeline stage is not limited to four stages. In addition, the specific configuration of the controller 13 may be a set of a plurality of circuits. For example, the controller 13 may calculate the replacement way, control the selector 19, and control the access to the main memory 3 by using separate circuits respectively.


Also in other embodiments of the present invention, there is a cache memory including a part of the configuration included in the cache memory 1 described above. More specifically, we assume the cache memory for performing the process of reading out of the tag memory 11 and the process of the hit decision in the separate pipeline stages. And when the miss hit occurs, the decision of the replacement way and rewriting of the tag memory 11 are performed in the pipeline stage just after the pipeline stage which performs the hit decision. At the same time, the tag address corresponding to the access request in which the result was the miss hit is forwarded to the hit decision of the subsequent access request by bypassing the tag memory 11. By having such a configuration, the error of the hit decision can be prevented without retrying the subsequent access request from the beginning of the pipeline.


In addition, the cache memory 1 as described above stalls the pipeline by one cycle when the miss hit occurs. This configuration is effective in that information of the tag address which is to be replaced can definitely be reflected on the hit decision of the subsequent memory access request. However, it is also possible that the information of the tag address which is to be replaced can be reflected on the hit decision of the subsequent memory access request without stalling the pipeline by speeding up the decision process of the replacement way by deciding the replacement way in the random method, for example.


Furthermore, it is apparent that the present invention is not limited to the above embodiment, but may be modified and changed without departing from the scope and spirit of the invention.

Claims
  • 1. A cache memory arranged between a processor and a low-speed memory and performing a pipeline processing of a memory access made by the processor, comprising: a data memory being configured to store data corresponding to a subset of the low-speed memory;a tag memory being configured to store tag addresses corresponding to the data stored in the data memory;a hit decision unit being configured to decide whether there is a cache hit or a miss hit by comparing at least one tag address acquired by searching the tag memory using an index address included in an input address supplied from the processor with a tag address included in the input address;a latch circuit being configured to hold the tag address included in the input address;a bypass circuit being configured to provide the tag address held by the latch circuit to the hit decision unit by bypassing the tag memory; anda controller being configured to control an update process of the tag memory by the tag address included in the input address, an update process of the data memory by reading out of the data from the low-speed memory, and a behavior of the bypass circuit when the hit decision result is the miss hit.
  • 2. The cache memory according to claim 1, wherein the cache memory performs: a process of reading out of the tag address from the tag memory using the index address in a first pipeline stage;a decision process by the hit decision unit in a second pipeline stage after the first pipeline stage; anda process of controlling of an update of the tag memory by the controller and controlling of the bypass circuit in order to input the tag address held by the latch circuit to the hit decision unit by bypassing the tag memory in the third pipeline stage, and an update process of the data memory by reading out of the data from the low-speed memory and a process of outputting of the data read out from the low-speed memory to the processor in a fourth pipeline stage just after the third pipeline stage or in a later pipeline stage when the hit decision result at the second pipeline stage is the miss hit.
  • 3. The cache memory according to claim 2, wherein the update process of the data memory by reading out of the data from the low-speed memory is performed in a last pipeline stage.
  • 4. The cache memory according to claim 2, wherein at least two clock cycles are assigned to the third pipeline stage, the controller performs the update of the tag memory by a clock cycle before a final clock cycle of the third pipeline stage, and the controller controls the behavior of the bypass circuit so that the tag address held by the latch circuit is input to the hit decision unit in the final clock cycle when the hit decision result by the hit decision unit is the miss hit.
  • 5. The cache memory according to claim 1, wherein: the bypass circuit comprises a selector being configured to selectively output one of the tag address provided from the latch circuit and the tag address provided from the tag memory to the hit decision unit, andthe controller causes the selector to select the output of the tag address provided from the latch circuit in response to the hit decision result made by the hit decision unit being the miss hit.
  • 6. A method for controlling a cache memory that processes a memory access made by a processor in at least a four-stage pipeline, the cache memory comprising: a data memory being configured to store data corresponding to a subset of a low-speed memory;a tag memory being configured to store tag addresses corresponding to the data stored in the data memory;a hit decision unit being configured to decide whether there is a cache hit or a miss hit by comparing at least one tag address acquired by searching the tag memory using an index address included in an input address supplied from the processor with a tag address included in the input address; anda bypass circuit being configured to provide the tag address included in the input address to the hit decision unit by bypassing the tag memory, wherein the method comprises:executing a process of reading out of the tag address from the tag memory using the index address in a first pipeline stage;executing a decision process by the hit decision unit in a second pipeline stage after the first pipeline stage; andexecuting a process of controlling of an update of the tag memory by the controller and controlling of the bypass circuit for inputting the tag address included in the input address to the hit decision unit by bypassing the tag memory in the third pipeline stage, and an update process of the data memory by reading out of the data from the low-speed memory and a process of outputting of the data read out from the low-speed memory to the processor in a fourth pipeline stage just after the third pipeline stage or in a later pipeline stage when the hit decision result at the second pipeline stage is the miss hit.
  • 7. The method according to claim 6, wherein the update process of the data memory by reading out of the data from the low-speed memory is performed in a last pipeline stage.
  • 8. The method according to claim 6, wherein the method further comprises assigning at least two clock cycles to the third pipeline stage, wherein the update of the tag memory is performed by a clock cycle before a final clock cycle of the third pipeline stage, andthe behavior of the bypass circuit is controlled so that the tag address included in the input address is input to the hit decision unit in the final clock cycle when the hit decision result by the hit decision unit is the miss hit.
  • 9. A cache memory arranged between a processor and a low-speed memory and performing a pipeline processing of a memory access made by the processor, comprising: a data memory being configured to store data corresponding to a subset of the low-speed memory;a tag memory being configured to store tag addresses corresponding to the data stored in the data memory;a hit decision means for deciding whether there is a cache hit or a miss hit by comparing at least one tag address acquired by searching the tag memory using an index address included in an input address supplied from the processor with a tag address included in the input address;latching means for holding the tag address included in the input address;selecting means for selectively providing the tag address held by the latching means to the hit decision means as a substitute for the tag address output from the tag memory; andcontrol means for controlling an update process of the tag memory by the tag address included in the input address, an update process of the data memory by reading out of the data from the low-speed memory, and a behavior of the selecting means when the hit decision result is the miss hit.
  • 10. The cache memory according to claim 9, wherein the cache memory performs: a process of reading out of the tag address from the tag memory using the index address in a first pipeline stage;a decision process by the hit decision means in a second pipeline stage after the first pipeline stage; anda process of controlling of an update of the tag memory and controlling of the selecting means in order to input the tag address held by the latching means to the hit decision means by bypassing the tag memory in the third pipeline stage, and an update process of the data memory by reading out of the data from the low-speed memory and a process of outputting of the data read out from the low-speed memory to the processor in a fourth pipeline stage just after the third pipeline stage or in a later pipeline stage when the hit decision result at the second pipeline stage is the miss hit.
Priority Claims (1)
Number Date Country Kind
2006-288862 Oct 2006 JP national