This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2012-190441, filed on Aug. 30, 2012, the entire contents of which are incorporated herein by reference.
The embodiment discussed herein relates to a processor, and a control method of a processor.
There has been known a cache system capable of transferring an exclusive right in a clean state, such as enabled by adopting the MESI (Modified, Exclusive, Shared, Invalid) protocol. The individual states of the MESI protocol are as follow. M (Modified) state represents a state where none of the other requestors, but only a cache memory holds data with an exclusive right. The data is different from data stored in a low-order cache memory (or a memory). The data may be modified by an arbitrary storing operation from this state, while keeping the cache memory in the M state. When the cache memory is transferred to I state, it needs be updated by the data having been held by the low-order cache memory (or memory) (write-back).
E (Exclusive) state represents a state where none of the other requestors, but only the cache memory holds data with an exclusive right. The data is different from data held by the low-order cache memory (or memory). The data may be modified by an arbitrary storing operation, and the cache memory changes into the M state upon modification of the data. S (Shared) state represents a state where the cache memory holds data without an exclusive right. The data is same as data in the low-order cache memory (or memory). If there is a plurality of requestors, the plurality of requestor may be brought into the S state (shared state) at the same time. For storing, the cache memory needs to acquire the exclusive right to change into the E state. I (Invalid) state represents that the cache memory holds no data.
In this sort of cache system, when a certain data block is held by an L2 (Level-2) cache memory in the E state or the M state, but not by the other requestors, upon issuance of a load request from a certain register, there are two ways of making a response to the requestor: “response in the E state” and “response in the S state”. The paragraphs below will explain an exemplary system having a plurality of CPU cores as processing sections, each of which having a calculation section and an L1 (Level-1) cache memory, and the individual cores share an L2 (Level-2) cache memory. The individual CPU cores correspond to the requestors, and the L2 cache memory makes a response to a destination. In the description below, the low-order cache memory is defined to be kept in the E state.
As illustrated in flow charts of
On the other hand, as illustrated in
As described above, both cases are in a trade-off in terms of performance. The cache system is generally designed so that, when a certain core issues the load request in the state of “data held by no core”, the core “makes a response in the E state” assuming that only the core uses the data.
There has been proposed a method of controlling a cache memory, for which a change flag is set when the data is written into a processor, and is asked to reset the change flag when the data is read out from the processor, wherein the change flag is reset by a specific command (for example, refer to Patent Document 1).
Problems will, however, arise in the cache system designed to make a response in the E state, when a load request is issued by a certain core, while none of the cores holds data, in the following cases. That is a case where a certain address is repetitively referred multiple times by a plurality of cores, the cache memory is replaced, and thereby all cores are brought into a state of having no data. If any one core holds the data, it is of no problem since the response will be made in the S state. Very frequent replacement of the cache in the core may result in the case below. The case will be explained referring to
In an exemplary case illustrated in
On the other hand, in an exemplary case illustrated in
In contrast, in an exemplary case illustrated in
According to one aspect, a processor includes a plurality of processing sections, each including a first cache memory, that executes processing and issues a request; and a second cache memory. The second cache memory is configured, when a request which requests a target data held by none of the first cache memories contained in the plurality of processing sections, and is received from any one of the plurality of processing sections, is a load request that permits a processing section other than the processing section having sent the request to hold the target data, to make a response to the processing section having sent the request, with non-exclusive information which indicates that the target data is non-exclusive data, together with the target data. The second cache memory is also configured, when the request is a load request which forbids a processing section other than the processing section having sent the request to hold the target data, to make a response to the processing section having sent the request, with exclusive information which indicates that the target data is exclusive, together with the target data.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Embodiment will be described below, referring to the attached drawings.
The request receiving sections 14 (14-0 to 14-n) are provided corresponding to the individual cores 11 (11-0 to 11-n), and receives requests from the cores 11 such as load request, store request and so forth. The requests received by the individual request receiving sections 14 are sent to the priority control section 15. The priority control section 15 selects a request to be input to the tag control section (pipeline) 16 typically according to the LRU (Least Recently Used) algorithm, and outputs it. The tag control section (pipeline) 16 directs the tagged memory 17 to read the (TAG), and receives tag hit (TAG HIT) information obtained from a process by the decision section 18. The tag control section (pipeline) 16 also outputs the requests fed from the tag hit information and the priority control section 15, to the response decision section 19.
The tagged memory 17 holds data regarding data held by the data memory. The tagged data contains information regarding states of the individual cache memories and information regarding which L1 cache memory 12 of the core 11 holds data. An exemplary configuration of the data held in the tagged memory 17 is shown in
The address tag 101 is tag information regarding an address of data held in the data memory 23. The state information (L2-STATE) 102 of the L2 cache memory is 2-bit information indicating a state of the L2 cache memory. In this embodiment, it is defined that value “0” (00b) represents the I state, value “1” (01b) represents the S state, and value “2” (10b) represents the M state, and value “3” (11b) represents the E state.
The state information (L1-STATE) 103 of the L1 cache memory is 2-bit information indicating a state of the L1 cache memory. In this embodiment, it is defined that value “0” (00b) represents that none of cores hold the data (I), value “1” (01b) represents that one core holds the data in the S state (S), value “2” (10b) represents that two or more core hold the data in the S state (SHM), and value “3” (11b) represents that one core holds the data in the E state (E). The data holding information (L1-PRESENCE) 104 of the L1 cache memory is information regarding which core holds the data. In this embodiment, the information has 8 bits corresponding to 8 cores, where the core holding the data is assigned with value “1”, and the core not holding the data is assigned with value “0”. Accordingly, which core holds the data may uniquely be expressed, based on combinations of the state information (L1-STATE) 103 of the L1 cache memory and the data holding information (L1-PRESENCE) 104.
The hit decision section 18 compares a pipeline address based on the request fed by the priority control section 15, with the tagged data read out from the tagged memory 17, and determines whether the L2 cache memory contains any data corresponded to the pipeline address.
According to a L2 cache index 112 corresponded to the pipeline address based on the thus-fed request, an address tag 101 of each way, state information (L2-STATE) 102 of the L2 cache memory, state information (L1-STATE) 103 of the L1 cache memory, and data holding information (L1-PRESENCE) 104 are output from the tagged memory 17.
The state information (L2-STATE) 102 of the L2 cache memory for each way is calculated by an OR circuit 115, and if the state information (L2-STATE) 102 has a value other than “0” (00b), that is, if the state is other than the I state, the output will be “1”. In other words, the OR circuit 115 corresponded to the way having a valid data outputs value “1”. The address comparing section 116 compares the address tag 101 for each way with L2 cache tag 111 having the pipeline address, and if the both agree, then outputs value “1”. The output of the OR circuit 115 and the output of the address comparing section 116 are then calculated by a logical conjunction calculation circuit (AND circuit) 117, and a result of calculation is output as way information. In other words, only the AND circuit 117 corresponded to the way identified by cache hit will output value “1”.
The OR circuit 118 subjects the outputs from the individual AND circuits 117 to logical disjunction calculation, and outputs a result of calculation as a signal TAG HIT. On the other hand, the state information (L2-STATE) 102 of the L2 cache memory on the way identified by cache hit based on an AND circuit 119 and an OR circuit 120 is selected, and is output as state information (L2-STATE) of the thus-hit L2 cache memory. Similarly, the state information (L1-STATE) 103 of the L1 cache memory on the way identified by cache hit based on an AND circuit 121 and an OR circuit 122 is selected, and is output as state information (L1-STATE) of the thus-hit L1 cache memory. The data holding information (L1-PRESENCE) 104 of the L1 cache memory on the way identified by cache hit based on an AND circuit 123 and an OR circuit 124 is selected, and is output as data holding information (L1-PRESENCE) of the thus-hit L1 cache memory.
Referring now back to
If the state of the other cores is the I state, the response decision section 19 confirms that whether the thus-issued load request is LD(S) or LD(E). The response decision section 19 updates the response state of the requested core to the S state, if the thus-issued load request is LD(S), that is, a load request which permits the other cores to hold the target data, and updates the response state of the requested core to the E state, if the load request is LD(E), that is, a load request which forbids the other cores to hold the target data. As described above, in this embodiment as illustrated in
The tag state decoding section 131 receives the state information (L2-STATE) of the L2 cache memory corresponded to the tag hit information fed by the tag control section (pipeline) 16, state information (L1-STATE) of the L1 cache memory, and data holding information (L1-PRESENCE). The tag state decoding section 131 decodes them, and outputs the result of decoding to the update tag state creating section 133, the response state creating section 134, and the snoop request creating section 135. The request code decoding section 132 receives and decodes a request type code (REQ-CODE) contained in the request fed by the tag control section (pipeline) 16, and outputs the result of decoding to the update tag state creating section 133, the response state creating section 134, and the snoop request creating section 135.
The update tag state creating section 133 determines presence or absence of the tag response, according to exemplary operations illustrated in
The response state issuing section 20 issues the response state through a bus to the core 11, based on the response instruction and the response state received from the response decision section 19. The response data issuing section 21 also issues data output by the data memory 23 based on the way information fed by the hit decision section 18, as the response data through a response data bus to the core 11, based on the response instruction and the response state fed by the response decision section 19. The snoop issuing section 22 issues the snoop request through the bus to the core 11, based on the snoop instruction fed by the response decision section 19, and the snoop request type.
When cache misfit occurs in the L2 cache memory 13, operations involving issuance of a request to a main memory or other CPU, reception of the response, and storage of the response in the L2 cache memory 13 will occur. Constituents relevant to these operations are not illustrated.
As described above, in this embodiment, the load request LD(S) which requests a response in the S state, and the load request LD(E) which requests a response in the E state are used in the load request. The load request LD(S) and load request LD(E) are implemented as directed by software. For example, since the software knows whether the data block is to be modified (stored) or not, so that it can issue an appropriate instruction by using a load request which is less likely to be modified by a compiler or the like in the form of LD(S), and by using the other load request in the form of LD(E).
An exemplary implementation of the load request LD(S) and the load request LD(E) will be explained below. The description below deals with the case where the load request LD(S) and the load request LD(E) in this embodiment are applied to a program product illustrated in
While the description above dealt with the case where the load request LD(S) and the load request LD(E) are newly provided, only the load request LD(S) may be newly provided in an alternative configuration in which a response is made in the E state upon issuance of the load request LD which does not specify the response.
As described above, the load request less likely to be stored is handled in the form of LD(S), and the response is made in the S state. In this way, the response is made in the S state even after the replacement as illustrated in
The L1-PF includes L1-PF(S) which requests the prefetch only for the purpose of making reference, and L1-PF(E) which requests the prefetch for storing. Accordingly, the L1-PF(S) may be used as the load request LD(S) in this embodiment, so that it is no more necessary to newly define the load request LD(S), and this embodiment may be implemented without adding or modifying a command code. When the L1-PF(S) is used as the load request LD(S), it suffices that the request code decoding section 132 of the response decision section 19 interprets the L1-PF(S) as the load request LD(S).
In the example illustrated in
The prefetch request is used to hide the latency of the L2 cache memory. Taking the latency of the L2 cache memory into account, several commands' worth (20 commands' worth, for example) of intervals may be provided between the command P41 (or command P42, if this is added) and the command P43.
There are two possible methods of expressing the load request LD(S) and the load request LD(E) using the L1-PF, which are a method of implementing the load request LD(E) only with the load request other than the L1-PF, and a method of implementing it together with the L1-PF(E). It is, however, better to implement the load request LD(E) only with the load request other than L1-PF, since the L1-PF(E) preferably holds the data in the E state for future storage. While the L1-PF is preferably assumed to be L1-SW(software)-PF designated by software, it is also adaptable to L1-HW(hardware)-PF by which the L1-PF is automatically generated by detecting a pattern of memory access address.
According to this embodiment, processes regarding the snoop transaction and transfer of exclusive right of cache state and so forth may be suppressed from occurring, and thereby the process performance of the processor may be improved, by making a response to the requestor after properly selecting which of the E state and the S state is to be used to make a response to the load request directed to a low-order cache memory. This embodiment described above is applicable not only to the cache system based on the MESI protocol, but also to any cache systems capable of transferring the exclusive right in a clean state. For example, this embodiment is also applicable to cache systems based on the MOESI protocol, MOWESI protocol and so forth.
According to one embodiment, upon issuance of the load request directed to a low-order cache memory, it is now possible to make a response in a proper state to the requestor, and thereby processes may successfully be reduced, and the process performance of the processor may be improved.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2012-190441 | Aug 2012 | JP | national |