This application is based upon and claims the benefit of priority from Japanese patent application No. 2007-338860, filed on Dec. 28, 2007, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of the Invention
The present invention relates to an apparatus and a method of controlling a load/store queue which stores a request to be issued to a main memory unit, and more particularly to an apparatus and a method for controlling the load/store queue provided between a cache memory (hereafter “cache”) and the main memory unit.
2. Description of Related Art
In recent years, when a load/store request is issued from a processor to a cache or when the load/store request is issued from a cache to a main memory unit, a load/store queue is used to conceal an access latency and a difference of a data transfer performance between the processor and the cache, or the cache and the main memory unit. The load/store queue has been provided between the processor and the cache, or between the cache and the main memory unit.
For example, the following techniques have been used for improving the access latency and the data transfer performance of the load/store queue.
(1) If a store request waiting to be issued in a store queue is followed by a load request including the same address as that of the store request, then the load access request is not issued to the cache or the main memory unit. Instead, a data in the store queue waiting to be issued by the store request is replied (returned) as the load access result, thereby reducing the access time.
(2) Another technique is that a load request taking more processing time than a store request is issued antecedent to the store request which is stored in the queue antecedent to the load request.
(3) If a store request is followed by a request including a same address as that of the preceding store request, then the store request is compressed by replacing or merging the store data. Methods for speeding up these functions have also been proposed.
In Patent Document 1, a technique related to the load/store queue installed between the processor and the cache is described. In Patent Document 1, when a store data is not ready for issue after a store request is issued, if the store request does not include a same address as that of a load request which is issued after the store request, then an issuing order is changed in a load/store queue to issue the load request antecedent to the store request. In other words, in Patent Document 1, when an issuance of the store request is delayed, the load request which includes an address different than that of the store request is issued antecedent to the store request.
A technique for merging store requests which include a same address is described in Patent Document 2.
Patent Documents 3 and 4 propose a speed-up method related to a load request following a store request including the same address.
[Patent Document 1]: Japanese Patent Laid-Open No. 06-131239
[Patent Document 2]: Japanese Patent Laid-Open No. 01-050139
[Patent Document 3]: Japanese Patent Laid-Open No. 2000-259412
[Patent Document 4]: Japanese Patent Laid-Open No. 2002-287959
According to one exemplary aspect of the present invention, an apparatus includes a queue element which stores a plurality of memory access requests to be issued to a memory device, the memory access requests including a store request and a load request, and a controller which changes an order of the store and load requests so that the order includes a store request string and a load request string.
According to another exemplary aspect of the present invention, a method includes storing a plurality of memory access requests to be issued to a memory device in a queue element, the memory access requests including a store request and a load request, and changing an order of the store and load requests so that the order includes a store request string and a load request string.
Other exemplary aspects and advantages of the invention will be made more apparent by the following detailed description and the accompanying drawings, wherein:
All techniques described in Patent Documents 1 to 4 relate to the load/store queue installed between the processor and the cache. However, these techniques do not intend to improve a performance and reduce a power consumption with respect to an access latency and data transfer by using characteristics of the main memory unit e.g., a DRAM, a synchronous DRAM, a DIMM or a SIMM using the DRAM or the synchronous DRAM.
In the main memory unit, a bus switching cycle is required between a read cycle and a write cycle. Regarding a load/store queue which is installed between the processor and the cache, the bus switching cycle is not a problem for (has no affect on) the access latency and the performance of the data transfer. That is, there is no problem since a request issued from the processor does not directly access the main memory unit.
On the contrary, regarding the load/store queue installed between the cache and the main memory unit, if a DRAM or a synchronous DRAM for use in the main memory unit is connected through a bidirectional bus, the bus switching cycle is required between a read cycle and a write cycle. Therefore, when a bus switching is performed (i.e., when the bus is switched) from a load request to a store request, or from a store request to a load request, one or more null cycles are necessary for avoiding a bus contention. For example, when requests are issued alternatively in the order of a load request, a store request, a load request, a store request, and so on, a null cycle occurs at every cycle. The occurrence of the null cycle becomes a problem for the access latency and the performance of the data transfer.
In the present invention, the occurrence of the null cycle is suppressed. Therefore, the access latency and the performance of the data transfer are improved.
As shown in
The cache 20 newly issues a request 50 to the load/store queue 10. The request 50 includes request type information (LD/ST) 41 indicating whether the request is the load request or the store request, an address 42 specifying data to be used by the request, and store data 48 to be stored in the main memory unit 30.
The load/store queue 10 includes a request queue 11 which actually issues a request to the main memory unit 30, a store data queue 12 which holds the store data 48, and a reply queue 13 which holds reply information (LD request reply information 49) in response to the load request. The load/store queue 10 may further include a load queue which holds load data, although it is omitted in
The load requests and the store requests which are issued in random order are sorted in the load/store queue 10 such that an order of the load and store requests become a string of the load requests and a string of the store requests (i.e., load requests are sequentially grouped together and store requests are sequentially grouped together). A control information 43 is added to the request 50 which is newly issued from the cache 20 to the load/store queue 10 for sorting the requests 50 in the load/store queue 10.
Regarding a queue in the load/store queue 10, a queue closer to the main memory unit 30 is defined as a preceding queue, and a request which is newly issued to the load/store queue 10 is moved to the preceding queue in the load/store queue 10.
The load and store requests which are sorted in the load/store queue 10 are issued to the main memory unit 30. When the request is the store request, the store data 48 is transferred to the main memory unit 30 and is stored in the specified address 42. When the request is the load request, load data and LD request reply information 49 about the load request are transferred from the main memory unit 30 to the load/store queue 10. When there are load requests which designate a same memory address in the request queue 11, the requests are compressed (by replacing or merging the data) into one request and issued to the main memory unit 30. However, the LD request reply information 49 holds the load request information which is not compressed. The load data from the main memory unit 30 is checked against the LD request reply information, and replied (e.g., returned) for each load request from the cache 20.
For example, the request queue 11 and the reply queue 13 in the load/store queue 10 may be made of a flip-flop (FF), and the store data queue 12 may be made of a random access memory (RAM). The main memory unit 30 may be made of a DRAM or a synchronous DRAM (SDRAM), and furthermore, may be made of a dual in-line memory module (DIMM) or a single in-line memory module (SIMM) using these DRAMs.
As shown in
The STwait 46 and the STwaitV 45 are used for retaining the store request in the load/store queue 10 until a predetermined condition is satisfied. For example, as a predetermined condition, when a number of the requests subsequent to the store request becomes a predetermined value, the store requests retained in the load/store queue 10 are issued to the main memory unit 30
The number of requests subsequent to the store request is counted by using the STwait 46. When the count value reaches a predetermined value, the STwaitV 45 is set. When the STwaitV 45 is set, the store requests retained in the load/store queue 10 are issued to the main memory unit 30. In other words, the store requests are retained in the load/store queue 10 without being issued to the main memory unit 30 until the STwait 46 reaches the predetermined value and the STwaitV 45 is set.
The store requests are retained in the load/store queue 10 until the number of requests newly issued to the load/store queue 10 reaches the predetermined value. So, many store requests are retained in the load/store queue 10 when there are the store requests, which include the same designated memory address, subsequent to the preceding store request. Therefore, the store requests are merged efficiently, and the string of the store requests are issued to the main memory unit 30 separately from the string of the load requests.
The adjacent address flag code 47 may be used for sorting the requests which are stored in the load/store queue 10 based on a predetermined unit of processing of addresses in the main memory unit 30. The addresses in the main memory unit 30 are divided into a plurality of units of processing, and the adjacent address flag code 47 is identification information indicating any one of the units of processing. For example, the predetermined unit may mean (e.g., represent) that a rank of the memory unit 30, a row address of the memory unit 30, etc. The requests may be sorted and controlled according to the address corresponding to the predetermined units by assigning the adjacent address flag code 47 to each of the requests which are stored in the load/store queue 10. For example, the same adjacent address flag code 47 may be assigned to a request including the same row address, and the same adjacent address flag code 47 may be assigned to a request including the same rank address.
The addresses of the requests which are stored in the load/store queue 10 are compared with addresses corresponding to newly issued requests. And, the requests which are stored in the load/store queue 10 and the newly issued requests are classified by adding the adjacent address flag code 47 to these requests. When the request which is stored in the load/store queue 10 is issued to the main memory unit 30, the requests including the same adjacent address flag code 47 are issued collectively and continuously to the main memory unit 30.
For example, when the load request is issued to the main memory unit 30 by a memory request selection unit (MRSU 15), a multiplexer may be controlled so as to select the load requests including the same adjacent address flag code 47 in the load/store queue 10 and continuously issue all the load requests which include the same adjacent address flag code 47.
When the store request is issued to the main memory unit 30, at first, the store request whose STwaitV 45 is set is issued to the main memory unit 30, and then, the store requests including the same adjacent address flag code 47 are continuously issued to the main memory unit 30. However, the store request may be issued before the STwaitV 45 is set or the store request may be selected and issued thereto from the store requests whose STwaitV 45 is set.
The same address request control unit 14 executes the below mentioned procedures, (i) to (iii), regarding the requests including the same address.
(i) When the preceding load request includes the same address as that of the following load request, these load requests are combined into one load request and then issued to the main memory unit 30. The address of a newly issued load request is compared with the addresses of all the load requests which are stored in the load/store queue 10. When the load request including the same address is found, only one of the load requests is placed in the request queue 11 and a plurality of the LD request reply information 49 (the number of the LD request reply information 49 corresponding to the number of the load requests including the same address), are placed in the reply queue 13. A function which is described in (i) is generally implemented in the cache 20. However, if the request source to the load/store queue 10 is not the cache 20, then the present control function may be implemented in the load/store queue 10.
(ii) When the address of the preceding store request is the same as that of the following store request, the two pieces of store data of the store requests are merged into one piece of store data. The address of preceding store request is compared with the address of the following store request for all the store requests which are stored in the load/store queue 10. When the store request including the same address is found, the store data 48 of the following store request is merged into the store data 48 of the preceding store request. In this case, only one store request is placed in the request queue 11 and one piece of merged store data 48 is held in the store data queue 12.
(iii) When the address of the preceding store request is the same as that of the following load request, the content of the store data 48 which is held in the store data queue 12 is replied (returned) as the data of the following load request. The address of the newly issued load request is compared with the addresses of all the store requests which are stored in the load/store queue 10. When a store request including the same address is found, the content of the store data 48 which is held in the store data queue 12 is replied (returned) as the load result to the cache.
According to the first exemplary embodiment, the requests which are stored in the load/store queue 10 are sorted so that the order of the requests becomes a string of the load requests and a string of the store requests.
As shown in
It should be noted that in the case where there is a preceding store request including the same address as that of a load request in the request queue, and a part of load data is not present as store data, if the store data of the store request may not be replied (returned) as data of the load request to the cache, the store request needs to be issued to the main memory unit ahead of the load request without waiting for a predetermined number of times. However, the control is not the essence of the present invention and thus the description is omitted. In addition, the first exemplary embodiment shows a control such that if a store request is present in an immediately preceding queue of the store request in the request queue and the immediately preceding queue is issued to the main memory unit, then the request is issued to the main memory unit without waiting for a store request to be issued. The immediately preceding queue means that the order of the immediately preceding queue is one step prior to the queue of the store request which is newly issued to the request queue.
First, the values of the valid information V44, the STwaitV 45, and the STwait 46 of the newly issued request (V=1, STwaitV=0, STwait=0) are initialized (step S101). Next, the request type information (LD/ST) 41 is checked to judge whether the request is the load request or not (step S102).
If the request is not the load request (i.e., the store request) as a result of determination in step S102, then the process goes to step S108. On the contrary, if the request is the load request, then a determination is made to see whether there is a valid request (the request with V=1) in the preceding queue in the load/store queue 10 (step S103) or not. In other words, a confirmation is made to see whether there is a preceding valid request in the load/store queue 10 or not.
If there is not a valid request in the preceding queue as a result of determination in step S103, then the process goes to step S118. On the contrary, if there is a valid request in the preceding queue, a further determination is made to see whether the request is placed in the nearest preceding queue or not (step S104). In other words, a confirmation is made to see whether the request is the next to be issued to the main memory unit 30.
If the request is placed in a nearest preceding queue, which is the most nearest queue of the load/store queue 10 with respect to the main memory unit 30, as a result of determination in step S104, then the process goes to step S118. On the contrary, if the request is not placed in the nearest preceding queue, then a further determination is made to see whether the requests in the preceding queue are all store requests or not (step S105). In other words, a confirmation is made to see whether the requests preceding the load request are all store requests or not.
If the requests in the preceding queue are all store requests as a result of determination in step S105, then the process goes to step S118. On the contrary, if the requests in the preceding queue are not all store requests, a further determination is made to see whether a request in the immediately preceding queue is a valid request (the request with V=1) (step S106). In other words, a confirmation is made to see whether there is an immediately preceding valid request in the load/store queue 10.
If the request in the immediately preceding queue is a valid request as a result of determination in step S106, then the process returns to step S106 again. In other words, if there is the immediately preceding valid request in the load/store queue 10, then the process is kept in standby until the preceding request becomes invalid. On the contrary, if the request in the immediately preceding queue is not the valid request, then the request is moved to the preceding queue and the process returns to step S103 (step S107).
If the request is not the load request (i.e., the store request) as a result of determination in step S102, then a further determination is made to see whether the immediately preceding request is the store request or not (step S108). In other words, a confirmation is made to see whether immediately preceding request is the store request or not.
If the immediately preceding request is not a store request as a result of determination in step S108, then the process goes to step S110. On the contrary, if the immediately preceding request is the store request, then a further determination is made to see whether the immediately preceding request is issued to the main memory unit 30 or not (step S109). In other words, a confirmation is made to see whether the immediately preceding store request has been issued or not.
If the immediately preceding store request has been issued to the main memory unit 30 as a result of determination in step S109, then the process goes to step S118. On the contrary, if the immediately preceding store request is not issued to the main memory unit 30, then a further determination is made to see whether a new request is issued to the load/store queue 10 or not (step S110). In other words, a confirmation is made to see whether the new request subsequent to the store request is issued or not.
If the new request is not issued as a result of determination in step S110, then the process returns to step S110. In other words, the process is kept in standby until the subsequent request is issued to the load/store queue 10. On the contrary, if the new request is issued, then the value of the STwait 46 is incremented (STwait=+1) (step S111). That is, the number of subsequent requests is counted.
Next, a determination is made based on the value of the STwait 46 to see whether the store request waits for a predetermined number of times in the load/store queue 10 (step S112). In other words, a confirmation is made to see whether the store request is in a ready-to-be-issued state or not.
If the store request does not wait for a predetermined number of times as a result of determination in step S112, then the process returns to step S110. The number of subsequent requests is counted, the store requests are retained in the load/store queue 10 until the count value reaches the predetermined number of times. On the contrary, if the store request waits for the predetermined number of times, then the value of the STwaitV 45 is changed to a valid value (STwaitV =1) (step S113). In other words, the store request is placed in a ready-to-be-issued state.
Next, a determination is made to see whether there is a valid request (the request with V=1) in the preceding queue in the load/store queue 10 or not (step S114). As a result of determination, if there is not the valid request in the preceding queue, then the process goes to step S118. On the contrary, if there is the valid request in the preceding queue, then a determination is made to see whether the request is placed in the nearest preceding queue (step S115).
If the request is placed in the nearest preceding queue as a result of determination in step S115, then the process goes to step S118. On the contrary, if the request is not placed in the nearest preceding queue, then a further determination is made to see whether the request in the immediately preceding queue is a valid request (here, a request with V=1) (step S116).
If the request in the immediately preceding queue is the valid request as a result of determination in step S116, then the process returns to step S116 again. If there is the immediately preceding valid request in the load/store queue 10, then the process is kept in standby until the preceding request becomes invalid. On the contrary, if the request in the immediately preceding queue is not the valid request, then the request is moved to the preceding queue and the process goes to step S108 (step S117).
On the contrary, if the request is ready to be issued to the main memory unit 30 as a result of determinations in steps S103, S104, S105, S109, S114, and S115, then the value of the valid information V44 of the request is cleared (V=0) (step S118). Then, the request is issued from the load/store queue 10 to the main memory unit 30 (step S119), and the entry is released from the request queue.
As described above, the store requests may be continuously retained in the load/store queue 10 by holding the store requests in the load/store queue 10, without being issued to the main memory unit 30 until the number of subsequent requests reaches the predetermined number and by reordering the load requests following the store request ahead of the store request. Therefore, when the request is issued from the load/store queue 10, to the main memory unit 30, the store requests may be continuously issued and the load requests between the store requests may also be continuously issued.
Accordingly, the requests may be efficiently issued to the main memory unit 30 by suppressing the null cycle from occurring in the bus switching between the read cycle and the write cycle, thereby providing performance improvement and low power consumption with respect to access latency and data transfer.
Next, a second exemplary embodiment of the present invention will be described in detail with reference to drawings.
According to the second exemplary embodiment, a determination is made to see whether the address of the request in the load/store queue 10 is included in the same unit of processing in the main memory unit 30 or not And, when the request is issued to the main memory unit 30, the requests including an address included in the same unit are issued collectively and continuously.
First, the value (V=1) of the valid information V44 of the newly issued request is initialized (step S201). Next, the addresses of all the requests existing in the preceding queue in the load/store queue 10 and the address of the newly issued request are compared with each other (step S202).
As a result of determination in step S202, a determination is made to see whether there is the request including the same row address or the same rank address as the address of the newly issued request in the preceding queue of the load/store queue 10 (step S203). A confirmation is made to see whether there has already been the request in the preceding queue of the load/store queue 10, the request preceding the newly issued request. And, a confirmation is made to see whether the address of the request preceding to the newly issued request includes the same row address or the same rank address as the address of the newly issued request.
If there has already been the request including the same row address or the same rank address as the address of the newly issued request as a result of determination in step S203, then the adjacent address flag code 47 is assigned to the newly issued request (step S210). The adjacent flag code 47 is the same as that assigned to the request including the same row address or the same rank address. On the contrary, if there is no request including the same row address or the same rank address as the address of the newly issued request, then the new adjacent address flag code 47 is created and is assigned to the newly issued request (step S204).
Next, a determination is made to see whether there is the valid request (the request with V=1) in the preceding queue in the load/store queue 10 or not (step S205). If there is not a valid request in the preceding queue as a result of determination, then the process goes to step S211. On the contrary, if there is a valid request in the preceding queue, then a further determination is made to see whether the request is placed in the nearest preceding queue or not (step S206).
If the request is placed in the nearest preceding queue as a result of determination in step S206, then the process goes to step S211. On the contrary, if the request is not placed in the nearest preceding queue, then a further determination is made to see whether a request including the same adjacent address flag code 47 is issued to the main memory unit 30 or not (step S207). A confirmation is made to see whether the requests including the same adjacent address flag code 47 are issued collectively and continuously.
If the request including the same adjacent address flag code 47 is issued to the main memory unit 30 as a result of determination in step S207, then the process goes to step S211. On the contrary, if the request including the same adjacent address flag code 47 is not issued to the main memory unit 30, then a further determination is made to see whether the request in the immediately preceding queue is the valid request (V=1) or not (step S208).
If the request in the immediately preceding queue is the valid request as a result of determination in step S208, then the process returns to step S208 again. On the contrary, if the request in the immediately preceding queue is not the valid request, then the request is moved to the preceding queue, and the process goes to step S205 (step S209).
On the contrary, if the request is ready to be issued to the main memory unit 30 as a result of determinations in step S205, S206, and S207, then the value of the valid information V44 of the request is cleared (V=0) (step S211). Then, the request is issued from the load/store queue 10 to the main memory unit 30 (step S212).
As described above, when the address of the request which is stored in the load/store queue 10 is the same as the row address or the rank address, the request including the same adjacent address flag code 47 is issued to the main memory unit 30. Thereby, the main memory unit 30 may be continuously accessed by the same row address or by the same rank address. Therefore, when a request is issued from the load/store queue 10 to the main memory unit 30, an RAS (Row Address Strobe) may be activated only once for one transfer of the same row address, thereby reducing the number of RAS activations.
In addition, in the case where a DIMM or the like is used in the main memory unit 30, and higher-speed access can be provided for the case where the same rank address accesses continue than the case where different rank address accesses continue, the same rank address accesses may continue, and thus the process of the main memory unit 30 may be sped up.
According to the third exemplary embodiment, not only the requests which are stored in the load/store queue are sorted so that the order of the requests become a string of the load requests and a string of the store requests, but also the request including the address of the same unit of the main memory unit 30 is issued together as well if the address of a request in the load/store queue 10 is the address of the same unit of processing in the main memory unit 30.
As described in the exemplary flowchart shown in
Then, as described in the exemplary flowchart shown in
By doing so, when a request is issued from the load/store queue 10 to the main memory unit 30, continuous store requests and continuous load requests may be efficiently issued, thereby providing performance improvement and low power consumption.
Also, when the store request is issued to the main memory unit 30, first the store request whose STwaitV 45 is set may be issued to the main memory unit 30, and then, the store request including the same adjacent address flag code 47 may be issued continuously. Or, the store request may be issued to the main memory unit 30 before the STwaitV 45 is set.
As described in the exemplary flowchart shown in
Then, as described in the exemplary flowchart shown in
According to the fourth exemplary embodiment, as described in the exemplary flowchart shown in
By retaining the store requests until the predetermined condition is satisfied, many store requests are retained in the load/store queue 10. Accordingly, it is possible to improve the merge probability of the store requests and to efficiently issue the store request to the main memory unit 30.
According to a fifth exemplary embodiment, as described in the exemplary flowchart shown in
By retaining the store requests without being issued, the requests are sorted so that the order of the requests becomes the string of the store requests and the string of the load requests. By retaining as many store requests as possible in the load/store queue 10, the probability that subsequent load requests including the same address as that of the preceding store request, is increased. Therefore, the requests may be issued more efficiently to the main memory unit 30.
According to a sixth exemplary embodiment, as described in the flowchart shown in
By retaining the store requests without being issued, the requests are sorted so that the order of the requests becomes the string of the store requests and the string of the load requests. By placing only one load request which includes the same address, the load request may be further efficiently issued to the main memory unit 30.
The control unit 110 controls the order of the requests so that the order of the requests becomes the string of the load requests and the string of the store requests by sorting the requests which are stored in the load/store queue 10. The control unit 110 includes a store request control unit 120, a load request control unit 130, a request determination unit 140, and an address determination unit 150. The store request control unit 120 further includes a request measurement unit 121.
The store request control unit 120 retains the store requests in the load/store queue 10 until the predetermined condition is satisfied. For example, the store request control unit 120 retains the store requests in the load/store queue 10 until the number of requests newly issued to the load/store queue 10 reaches the predetermined number. The store request control unit 120 counts the number of requests issued after the store request by the request measurement unit 121, and retains the store requests in the load/store queue 10 until the number of the count value reaches the predetermined number. The store request control unit 120 may be controlled so as to retain the store requests in the load/store queue 10, in the load/store queue 10 for a predetermined time.
The load request control unit 130 sorts the load requests subsequent to the store requests which are retained in the load/store queue 10 so that the load requests become ahead of the store requests which are retained in the load/store queue 10.
The load request control unit 130 uses the request determination unit 140 to determine whether the request ready to be issued from the load/store queue 10 to the main memory unit 30 is the store request or the load request. If the request is the store request, then the store request is retained in the load/store queue 10.
The control unit 110 uses the address determination unit 160 to determine whether the address of a first request and the address of a second request in the load/store queue 10 are the addresses included in the same unit of processing in the main memory unit 30. If the address of the first request and the address of the second request are the addresses included in the same unit of processing in the main memory unit 30, then when the first request is issued to the main memory unit 30, the second request is also issued together to the main memory unit 30.
In the above described exemplary embodiments 1 to 6, the store request is retained in the load/store queue 10 until the predetermined condition is satisfied. However, the present invention is not limited to these exemplary embodiments. For example, if the request is ready to be issued to the main memory unit 30, then a determination is made to see whether the request is the store request or the load request. As a result of the determination, if the request is the store request, then the store request may be retained in the load/store queue 10. In the above described exemplary embodiments, the store request is retained according to the number of subsequent requests. However, the store request may be retained according to the time (duration) that the store request is present in the load/store queue 10.
In the present invention, the order of the requests is sorted so that the order becomes the string of the store requests and the string of the load requests, and then the requests are issued to the main memory unit 30 according to the sorted order. Accordingly, the present invention may provide performance improvement and low power consumption with respect to access latency and data transfer.
Further, it is noted that applicant's intent is to encompass equivalents of all claim elements, even if amended later during prosecution.
Number | Date | Country | Kind |
---|---|---|---|
2007-338860 | Dec 2007 | JP | national |