This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2013-062811, filed on Mar. 25, 2013, the entire contents of which are incorporated herein by reference.
The embodiments described herein are related to an operation processing apparatus, an information processing apparatus and a method of controlling an information processing apparatus.
An operation processing apparatus is applied to practical use for sharing data stored in a main memory among a plurality of processor cores in an information processing apparatus. Plural pairs of a processor core and an L1 cache form a group of processor cores in the information processing apparatus. A group of processor cores is connected with an L2 cache, an L2 cache control unit and a main memory. A set of the group of processor cores, the L2 cache, the L2 cache control unit and the memory is referred to as cluster.
A cache is a storage unit with small capacity which stores data used frequently among data stored in a main memory with large capacity. When data in a main memory is temporarily stored in a cache, the frequency of access to the memory, which is time-consuming, is reduced. The cache employs a hierarchical structure in which processing at higher speed is achieved in a higher level and larger capacity is achieved in a lower level.
In a directory-based cache coherence control scheme, the L2 cache as described above stores data requested by the group of processor cores in the cluster to which the L2 cache belongs. The group of processor cores is configured to acquire data more frequently from an L2 cache closer to the group of processor cores. In addition, data stored in a main memory is administered by the cluster to which the memory belongs in order to maintain the data consistency.
Further, the cluster administers in what state data in the memory to be administered is and in which L2 cache the data is stored according to this scheme. Moreover, when the cluster receives a request to the memory for acquiring data, the cluster performs appropriate processes for the data acquisition request based on the current state of the data. And then the cluster performs the processes for the data acquisition request and updates the information related to the state of the data.
As illustrated in Patent Document 1, a proposal is offered for reducing the latency required for an access to a main memory in an operation processing apparatus employing the above cluster structure and the above processing scheme. In Patent Document 1, when cache miss occurs in a cache and the cache does not have capacity available for storing data, data in the memory in the cluster to which the cache belongs is preferentially swept from the cache to create available capacity.
[Patent Document]
According to an aspect of the embodiments, it is provided An operation processing apparatus connected with another operation processing apparatus, including an operation processing unit configured to perform an operation process using first data administered by the own operation processing apparatus and second data administered by another operation processing apparatus and acquired from another operation processing apparatus, a main memory configured to store the first data and third data, and a control unit configured to include a setting unit which sets the operation processing unit to an operating state or a non-operating state and a cache memory which holds the first data, the second data and the third data, wherein when the setting unit sets the operation processing unit to the non-operating state and the third data is requested from another operation processing apparatus, which triggers cache miss in the cache memory, the control unit reads the requested third data from the main memory and holds the requested third data in the cache memory and sends the read third data to another operation processing apparatus.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
In the above described technologies, a process for accessing a main memory to write back data to the memory is performed because cache is temporary storage. A main memory is large capacity and may be mounted on a chip different from a chip for a group of processor cores and a cache. Thus, an access to a main memory can be a bottleneck for reducing data access latency. Thus, it is an object of one aspect of the technique disclosed herein to provide an operation processing apparatus, information processing apparatus and a method of controlling an information processing apparatus to reduce the access frequency to a main memory. First, a comparative example of an information processing apparatus according to one embodiment is described with reference to the drawings.
In the following descriptions, a cluster to which an processor core requesting data stored in a main memory belongs is referred to as Local (cluster). In addition, a cluster to which the memory storing the requested data belongs is referred to as Home (cluster). Further, a cluster which is not Local and holds the requested data is referred to as Remote (cluster). Therefore, each cluster can be Local, Home and/or Remote according to where data is requested to or from. Moreover, a Local cluster also functions as Home in some cases for performing processes related to a data acquisition request. And a Remote cluster also functions as Home in some cases. Additionally, the state information of data stored in a main memory administered by a Home cluster is referred to as directory information. The details of the above components are described later.
As illustrated in
For example, when the cluster 10 acquires data stored not in the memory 102 but in the memory 202, the cluster 10 sends a data request to the cluster 20, to which the memory 202 storing the data belongs. The cluster 20 checks the state of the data. Here, the state of data means the status of use of the data such as in which cluster the data is stored, whether or not the data is being exclusively used, and in what state the synchronization of the data is in the information processing apparatus 1. In addition, when the data to be acquired is stored in the L2 cache 203 belonging to the cluster 20 and the synchronization of the data is established in the information processing apparatus 1, the cluster 20 sends the data to the cluster 10 requesting the data. And then the cluster 20 records in the state information of the data that the data is sent to the cluster 10 and the data is synchronized in the information processing apparatus 1.
The controller 101a uses the tag RAM 103a to check in which state a memory block is stored in the data RAM 103b and the presence of data. The data RAM 103b is a RAM for holding a copy of data stored in the memory 102, for example. The directory RAM 104 is a RAM for handling the directory information of a main memory which belongs to a Home cluster. Since the directory information is a large amount of information, the directory information is stored in a main memory and a cache for the memory is arranged in the RAM in many cases. However, the directory information of the memory which belongs to the Home cluster is stored in the directory RAM 104 in the present embodiment.
The controller 101a accepts requests from the group of processor cores 100 or controllers in L2 cache control units in other clusters. The controller 101a sends operation requests to the tag RAM 103a, the data RAM 103b, the directory RAM 104, the memory 102 or other clusters according to the contents of received requests. And when the requested operations are completed, the controller 101a returns the operation results to the requestors of the operations.
A request of data is sent from an processor core in the cluster 10 which is Local to the L2 cache control unit 101. When the L2 cache control unit 101 in the cluster 10 which is also Home determines that the L2 cache 103 does not hold the data (miss), the L2 cache control unit 101 refers to the directory information stored in the directory RAM 104. And then the L2 cache control unit 101 checks based on the directory information to determine whether or not the data is held by an L2 cache in a Remote cluster. When the L2 cache control unit 101 determines that the L2 cache in the Remote cluster does not hold the data (miss), the L2 cache control unit 101 requests data acquisition to the memory 102 in the cluster 10 which is Local. When the memory 102 returns the data to the L2 cache control unit 101, the L2 cache control unit 101 stores the data in the data RAM 103b in the L2 cache 103. In addition, the L2 cache control unit 101 sends the data to the processor core requesting the data in the group of processor cores 100. Further, the tag RAM 103a in the L2 cache stores information indicating that the data is acquired in the state in which the data is synchronized in the information processing apparatus 1. Further, the directory RAM 104 stores information indicating that the data is held by the cluster 10 which is Local.
When the L2 cache control unit 101 refers to the tag RAM 103a to determine that the data RAM 103b in the L2 cache 103 does not have capacity for storing data, the L2 cache control unit 101 evicts data from the L2 cache 103 according to a predetermined algorithm including a random algorithm and LRU (Least Recently Used) algorithm. When the L2 cache control unit 101 refers to the tag RAM 103a to determine that the data to be evicted is in the state similar to the data stored in the memory 102, the L2 cache control unit 101 discards the data to be evicted. On the other hand, when the L2 cache control unit 101 refers to the tag RAM 103a to determine that the data to be evicted has been updated, the L2 cache control unit 101 writes back the data to be evicted to the memory 102.
Thus, the data requested by the processor core in the group of processor cores 100 is stored in free space in the data RAM 103b in the L2 cache 103. Additionally, when an processor core in the group of processor cores 100 generates a data acquisition request for the data again, the L2 cache control unit 101 holds the data stored in the data RAM 103b and sends the data to the processor core (hit). Therefore, as long as the data is not evicted from the data RAM 103b, the L2 cache control unit 101 does not access to the memory 102.
First, the controller 101a checks the tag RAM 103a to determine whether or not a copy of a block of a main memory which stores the data as the target of the data acquisition request is found in the data RAM 103b. When the controller 101a receives a result indicating that the copy is not found (miss) from the tag RAM 103a, the controller 101a refers to the directory RAM 104 to check whether or not the data as the target of the data acquisition request is held by Remote clusters. The controller 101a receives a result indicating that the data is not held by clusters (miss) from the directory RAM 104, the controller 101a sends a data acquisition request of the data to the memory 102. When the controller 101a receives the data from the memory 102, the controller 101a registers in the directory RAM 104 information indicating that the data is held by a Home cluster. In addition, the controller 101a stores information of the status of use of the data (“Shared” etc.) in the tag RAM 103a. Further, the controller 101a stores the data in the data RAM 103b. Moreover, the controller 101a sends the data to the processor core requesting the data in the group of processor cores 100.
Next,
When the memory 202 returns the data to the L2 cache control unit 201, the L2 cache control unit 201 updates the directory information stored in the directory RAM 204. And the L2 cache control unit 201 sends the data to the cluster 10 which is Local and requesting the data. The L2 cache control unit 101 in the cluster 10 stores in the L2 cache 103 the data received from the L2 cache control unit 201 in the cluster 20. And then the L2 cache control unit 101 sends the data to the processor core requesting the data in the group of processor cores 100.
Here, the data is not stored in the L2 cache 203 in the cluster 20 which is Home for the following reasons. First, the data is requested from an processor core in the cluster 10 which is Local and not requested from an processor core in the cluster 20 which is Home. Second, when the data is stored in the L2 cache 203 in the cluster 20 which is Home, this means that data which is not used by the group of processor cores 200 in the cluster 20 which is Home is stored in the L2 cache 203. Third, when such unused data is stored in the L2 cache 203, data used by the group of processor cores 200 may be evicted from the L2 cache 203.
The controller 101a checks the tag RAM 103a to determine whether or not a copy of a block of a main memory which stores data as the target of the data acquisition request is found in the data RAM 103b. When the controller 101a receives a result indicating that the copy is not found (miss) from the tag RAM 103a, the controller 101a sends a data acquisition request of the data to the controller 201a in the L2 cache control unit 201 which belongs to the cluster 20 which is Home.
When the controller 201a receives the data acquisition request, the controller 201a checks the directory RAM 204 to determine whether or not the data as the target of the data acquisition request is stored in an L2 cache in any cluster. When the controller 201a receives a result indicating that the data is not found in clusters (miss) from the directory RAM 204, the controller 201a sends a data acquisition request for the data to the memory 202. When the memory 202 returns the data to the controller 201a, the controller 201a stores as the status of use of the data in the directory RAM 204 the information indicating that the data is held by the cluster 10 requesting the data. And then the controller 201a sends the data to the controller 101a in the cluster 10 requesting the data. When the controller 101a in the cluster 10 receives the data, the controller 101a stores the status of use of the data (“Shared” etc.) in the tag RAM 103a. In addition, the controller 101a stores the data in the data RAM 103b. Further, the controller 101a sends the data to the processor core requesting the data in the group of processor cores 100.
Moreover, Write Back to a Remote cluster means processes performed when a cluster evicts data acquired from another cluster from the cache in the cluster. Write Back also means processes for notifying another cluster that the data is so-called “dirty” when the evicted data is updated and is not synchronized in the information processing apparatus 1, that is, the evicted data is dirty. As described below, when a cluster executes Flush Back to a Remote cluster in the comparative example, the cluster sends a Flush Back request to the cluster from which the data is acquired and does not send the data to the cluster from which the data is acquired. To the contrary, when the cluster executes Write Back to a Remote cluster in the comparative example, the cluster sends a Write Back request to the cluster from which the data is acquired and also sends the data to the cluster from which the data is acquired so that the cluster from which the data is acquired stores the data in the memory.
As described above, when new data is stored in an L2 cache and the L2 cache does not have capacity for the data, data stored in the L2 cache is evicted according to a predetermined algorithm. In
In this case, as illustrated in
On the other hand, when the data to be evicted is dirty, a Write Back request and the data are sent to the L2 cache control unit 201 in the cluster 20 which is Home. For example, when data is updated by the group of processor cores 100 in the cluster 10 which is Local the data becomes dirty. In addition, the L2 cache control unit 201 stores in the directory information stored in the directory RAM 204 information indicating that the data is evicted from the cluster 10 requesting the data. The L2 cache control unit 201 writes back the data to the memory 202 which belongs to the cluster 20 which is Home. It is noted that an processor core in the cluster which is Remote requests the data to the cluster 20 which is Home. Namely, the data is not requested by the group of processor cores 200 in the cluster 20 which is Home. When the data is stored in the L2 cache 203 in the cluster 20 which is Home, other data which the group of processor cores 200 requests may be evicted from the L2 cache 203. Therefore, the data is not stored in the L2 cache 203 in the cluster 20 which is Home.
Next,
First, an processor core in the group of processor cores 100 in the cluster 10 which is Local requests acquisition of data to the L2 cache control unit 101. When the L2 cache control unit 101 receives the data acquisition request, the L2 cache control unit 101 checks whether or not the data is stored in the L2 cache 103. When the data is not stored in the L2 cache 103 (miss), the L2 cache control unit 101 sends an exclusive data acquisition request for the data to the L2 cache control unit 201 in the cluster 20 which is Home. When the L2 cache control unit 201 receives the exclusive data acquisition request, the L2 cache control unit refers to the directory information stored in the L2 cache control unit 201. The directory information indicates which cluster including the Home cluster holds the data. And then the L2 cache control unit 201 sends a discard request of the data to the cluster holding the data indicated by the directory information.
In the example as illustrated in
The controller 101a checks the tag RAM 103a to determine whether or not a copy of the block in the memory which stores the data as the target of the data acquisition request is found in the data RAM 103b. When the controller 101a receives a result indicating that the copy is not found (miss) from the tag RAM 103a, the controller 101a sends a data acquisition request of the data to the controller 201a in the L2 cache control unit 201 which belongs to the cluster 20 which is Home.
When the controller 201a receives the data acquisition request, the controller 201a checks the directory RAM 204 to determine whether or not the requested data is stored in an L2 cache in any cluster. When the controller 201a receives a result indicating that the data is held by the cluster 20 which is Home (hit), the controller 201a sends an invalidation request of the data to the tag RAM 203a. In addition, the controller 201a reads the data from the data RAM 203b. And then the controller 201a invalidates the information indicating that the data is held by a Home cluster in the directory RAM 204. Further, the controller 201a adds the information indicating that the cluster 10 requesting the data holds the data to the directory RAM 204. Moreover, the controller 201a sends the data to the controller 101a in the cluster 10 requesting the data. When the controller 101a in the cluster 10 receives the data, the controller 101a registers the status of use of the data in the tag RAM 103a. Additionally, the controller 101a stores the data in the data RAM 103b. And then the controller 101a sends the data to the processor core requesting the data in the group of processor cores.
Next,
However, the group of processor cores 200 in the cluster 20 which is Home is operating in the information processing apparatus 1 in the above comparative example. Therefore, the group of processor cores 100 in the cluster 10 and the group of processor cores 200 in the cluster 20 shares the L2 cache 203 in the cluster 20. As a result, the capacity of the L2 cache 203 available to the group of processor cores 200 is substantially decreased. In addition, complicated controls are involved in the L2 cache 203 to determine for example which data requested from which group of processor cores is preferentially stored in the L2 cache 203.
Further, the data evicted from the cluster which is Local is sent to the cluster 20 which is Home regardless of the status of use of the data. That is, in cases other than the case in which the data is updated and becomes dirty in the cluster 10 which is Local, data evicted from the cluster 10 is sent to the cluster 20. Therefore, even when the evicted data is synchronized in the information processing apparatus 1, which means that the data is clean, the data is sent to the cluster 20. Thus, this may lead to the increase of transactions between clusters.
With the above descriptions of the comparative example in mind, an example of an information processing apparatus according to one embodiment is described below with reference to the drawings. In the descriptions below, the operation state and non-operation state of the group of operations cores in each cluster are controlled. Thus, while the communication traffic is not increased the probability of cache hit of data in a L2 cache can be enhanced as described later. In addition, complicated administration and control is not involved for each data stored in a L2 cache in the present embodiment.
As illustrated in
The register 501b controls the operation mode of the cluster 50 in the information processing apparatus 2 according to the present embodiment. In the present embodiment, the operation mode includes three modes which are “mode off”, “mode on and processor cores operating” and “mode on and processor cores non-operating”. The operation mode “mode off” is an operation mode in which a cluster operates as described in the above comparative example. The operation mode “mode on and processor cores operating” is an operation mode in which a cluster sets the group of processor cores to an operating state and performs processes in the present embodiment (mode on). The operation mode “mode on and processor cores non-operating” is an operation mode in which a cluster sets the group of processor cores to a non-operating state and performs processes in the present embodiment. The details of the processes in these operation modes are described later.
The controller 501a reads setting values for the register 501b and switches the operation modes according to the setting values. In addition, the operation modes are switched before application execution in the information processing apparatus in the present embodiment. In addition, the OS (Operating System) of the information processing apparatus 2 controls the switching of the operation modes of the register in each cluster. It is noted that the switching of the operation modes can be performed by a user of the information processing apparatus 2 to explicitly instruct the OS or by the OS to autonomously instruct according to the information such as the memory usage of the application.
Additionally,
As illustrated in
An OR gate 601f outputs an instruction signal TagSave2 for storing information of the data in the tag RAM 603a when the AND gate 601e outputs “1” or information of the status of use of the data is stored in the tag RAM 603a according to the processes in the comparative example. An OR gate 601g outputs an instruction signal DataSave2 for storing the data in the data RAM 603b when the AND gate 601e outputs “1” or the data is stored in the data RAM 603b according to the processes in the comparative example. An OR gate 601h outputs an instruction signal DirectoryUpdate (SaveLocal) 2 for updating the directory information in the directory RAM 604 when the AND gate 601e outputs “1” or the directory information in the directory RAM 604 is updated according to the processes in the comparative example. Since circuits subsequent to the OR gates 601f to 601h are conventional circuits, the detailed descriptions and drawings of the subsequent circuits are omitted here.
When the controller 601a acquires the requested data from the memory 602, the controller 601a uses the control circuit as illustrated in
In S104, the controller 501a uses the address of the data included in the data acquisition request from the group of processor cores 500 to determine that the data is data stored in the memory 602. Therefore, the controller 501a sends a data acquisition request of the data to the controller 601a.
In S105, the controller 601a checks the directory information in the directory RAM 604 to determine the status of use of the data in the group to which the cluster belongs. The status of use of the data includes information indicating for example whether or not the data is acquired by other clusters. In the present embodiment, in S106, the directory RAM 604 determines that the directory information indicates that the data is not stored in data RAMs in clusters as well as in the data RAM 603b (cache miss). And then the directory RAM 604 sends the information indicating the cache miss to the controller 601a.
In S107, the controller 601a request the memory 602 to read the data requested from the controller 501a. In S108, the memory 602 sends the requested data to the controller 601a. When the controller 601a acquires the data from the memory 602, the control circuit as illustrated in
Therefore, in S109, the controller 601a requests the tag RAM 603a to update the information in the tag RAM 603a to indicate that the acquired data is stored in the data RAM 603b with the “Shared” status. In S110, the tag RAM 603a stores information indicating that the data is stored in the data RAM 603b with the “Shared” status. And the tag RAM 603a notifies the controller 601a that the storing process is completed. In S111, the controller 601a requests the data RAM 603b to store the data. In S112, when the data RAM 603b stores the data the data RAM 603b notifies the controller 601a that the storing process is completed.
In S112, the controller 601a requests the directory RAM 604 to update the directory information to indicate that the data is held by the cluster 50 which is also Remote and the cluster 60 which is Home. In S114, the directory RAM 604 updates the directory information according to the request and notifies the controller 601a that the updating process is completed. In S115, the controller 601a sends the data to the controller 501a.
In S116, the controller 501a requests the tag RAM 503a to update the information in the tag RAM 503a to indicate that the data acquired from the controller 601a is stored in the data RAM 503b. Further, the controller 501a also requests the tag RAM 503a to store the status of use of the data as “Shared”. In S117, when the tag RAM 503a performs the requested process, the tag RAM 503a notifies the controller 501a that the process is completed. In S118, the controller 501a requests the data RAM 503b to store the data. In S119, when the data RAM 503b stores the data the data RAM 503b notifies the controller 501a that the storing process is completed. In S120, the controller 501a sends the data to the processor core requesting the data in the group of processor cores 500.
In the present embodiment, the data acquired from the memory 602 is stored in the L2 cache 603 in the cluster 60 which is Home. In addition, the group of processor cores 600 in the cluster 60 which is Home is set to the non-operating state by the register 601b. Therefore, data storage to the L2 cache 603 is not performed by the group of processor cores 600. Thus, in contrast to the comparative example, the group of processor cores 500 does not encounter so-called cannibalization of memory capacity, that is, a situation in which the memory capacity of the L2 cache 603 is shared with a group of processor cores in another cluster.
Next,
Additionally,
An AND gate 601i outputs “1” when the operation mode of the cluster 60 is “mode on and processor cores non-operating”. The AND gate 601i outputs “0” in other cases. In addition, an AND gate 601j outputs “1” when the AND gate 601i outputs “1” and a Write Back request is received from the cluster 50 which is Local for example.
An OR gate 601k outputs an instruction signal TagSave2 for storing data in the tag RAM 603a when the AND gate 601j outputs “1” or data related to the status of use of data is stored in the tag RAM 603a according to the processes in the comparative example. An OR gate 601l outputs an instruction signal DataSave2 for storing data in the data RAM 603b when the AND gate 601j outputs “1” or data is stored in the data RAM 603b according to the processes in the comparative example. An OR gate 601m outputs an instruction signal DirectoryUpdate (SaveLocal2) for updating directory information in the directory RAM 604 when the AND gate 601j outputs “1” or directory information in the directory RAM 604 is updated according to the processes in the comparative example.
An inverter 601n prohibits storing data in the memory 602 when the operation mode of the cluster 60 is “mode on and processor cores non-operating” and a signal of an Write Back request from the cluster 50 for example is asserted. On the other hand, an AND gate 6010 outputs an instruction signal MemorySave2 for storing data in the memory 602 when the operation mode of the cluster 60 is “mode off” or “processor core operating” and data is stored in the memory 602 according to the processes in the comparative example. Alternatively, the AND gate 6010 outputs the instruction signal MemorySave2 when a Write Back request is not notified from the cluster 50 for example and data is stored in the memory 602 according to the processes in the comparative example. Since circuits subsequent to the OR gates 601k to 601m and the AND gate 6010 are conventional circuits, the detailed descriptions and drawings of the subsequent circuits are omitted here.
Consequently, when the group of processor cores 600 in the cluster 60 is in the operating state, the AND gate 601j outputs “0”. Thus, TAGSave2, DataSave2, DirectoryUpdate(SaveLocal)2 and MemorySave 2 are not asserted when a Write Back request (RequestlsWriteBack) is received from the cluster 50 which is Local. Alternatively, processes according to the processes in the comparative example are performed based on TAGSave, DataSave, DirectoryUpdate(SaveLocal) and MemorySave.
To the contrary, the AND gate 601j outputs “1” when the operation mode of the cluster 60 is “mode on and processor cores non-operating” and the controller 601a receives a Write Back request. In this case, the OR gate 601l outputs “1” and the evicted data is stored in the data RAM 603b in the L2 cache 603. Further, since the inverter 601n outputs “0”, the AND gate 601o outputs “0” and the data is not stored in the memory 602. It is noted that a set the inverter 601n and the AND gate 601o is an example of a blocking unit.
Here, as illustrated in
The controller 601a in the cluster 60 which is Home receives the above Write Back request from the controller 501a in the cluster 50 which is Local. And, the controller 601a stores the data which is received along with the Write Back request, that is, the data evicted from the data RAM 503b in the data RAM 603b. Therefore, the controller 601a updates the information stored in the tag RAM 603a to indicate that the data is stored in the data RAM 603b. And then the controller 601a requests the directory RAM 604 to update the directory information to indicate that the data is added to the cluster 60 which is Home. Further, the controller 601a requests the directory RAM 604 to indicate that the data is discarded from the cluster 50 which is Local.
When the controller 501a receives the data evicted from the data RAM 503b, the controller 501a sends in S205 a Write Back request with the data to the controller 601a. The controller 501a sends the Write Back request to the controller 601a since the status of use of the data retrieved from the tag RAM 503a in S202 is dirty. In addition, the controller 501a sends to the controller 601a the address which indicates in which cluster the data is stored in a main memory.
In S206, the controller 601a requests the tag RAM 603a to register the information which indicates that the data sent from the controller 501a is stored in the data RAM 603b. In addition, the controller 601a requests the tag RAM 603a to register the address which indicates in which cluster the data is stored in a main memory. In S207, the tag RAM 603a performs the registration process according to the request from the controller 601a and notifies the controller 601a that the process is completed. In S208, the controller 601a stores the data in the data RAM 603b. In S209, the data RAM 603b stores the data and notifies the controller 601a that the storing process is completed.
In S210, the controller 601a requests the directory RAM 604 to update the directory information to indicate that the data is held by the cluster 60 which is Home. Further, the controller 601a requests the directory RAM 604 to update the directory information to indicate that the data is discarded from the cluster 50 which is Local as well as Remote. In S211, the directory RAM 604 updates the directory information and notifies the controller 601a that the updating process is completed. In S212, the controller 601a notifies the controller 501a that the above processes are completed.
It is noted that in a cluster a directory RAM uses the directory information to administer which cluster retrieves each data stored in a data RAM by use of a bit corresponding to each cluster. For example, for each data a bit “1” is used for a cluster which holds the data and a bit “0” is used for a cluster which does not hold the data. Therefore, for example, in S210 as described above, the directory RAM 604 sets the bit for the cluster 60 to “1” and sets the bit for the cluster 50 to “0”. In the following descriptions, a directory RAM changes the bits in the directory information to register the status of use of each data. However, the configuration for administering the status of data retrieved by clusters in the directory RAM is not limited to the above embodiment. Since the processes performed by the controller 601a are the same as above when the controller 501a sends a Flush Back request to the controller 601a, the detailed descriptions of the processes are omitted here.
An example of the advantages obtained when the operation mode of each cluster is controlled according to the present embodiment is described with reference to
For example, it is assumed that the cluster 900a outside of the group 800 is permitted to access to the cluster 800c inside of the group 800. Further, it is assumed that the cluster 900a sends an exclusive data acquisition request to the cluster 800c to acquire data stored in the L2 cache in the cluster 800c. In this case, the data is moved to the cluster 900a and discarded from the L2 cache in the cluster 800c. In addition, the cluster 800c administers the directory information to indicate that the data is held by the cluster 900a, which is outside of the group 800. In the example as illustrated in
In the above comparative example, the groups of processor cores in the clusters which are Remote and Home in addition to the Local clusters are in the operating state. Therefore, the L2 caches in the Local clusters exchange data with other clusters. Thus, the capacity of the L2 cache used by the group of processor cores in the Local cluster is substantively reduced. Further, in the administration of data in the L2 cache, determination criteria and controls are more complicated partially because it is determined which data from which cluster is preferentially acquired or stored in the L2 cache. As a result, the configurations in the comparative example can lead to larger cost-related overhead and performance-related overhead in comparison with the configurations in the present embodiment. Moreover, the data administration involves for example storing additional information indicating from which cluster each data is evicted in the comparative example. To the contrary, the administration of such additional information is not involved in the present embodiment.
Besides, common rules can be applied to both cases in which the operation mode of the group of processor cores is “mode on” and “mode off” for the protocols used for the cache coherence control. For example, it is assumed here that the MESI protocol employing the four states, Modified, Exclusive, Shared and Invalid, is used when the operation mode of the group of processor cores is “mode on”. In this case, this MESI protocol can be used without defining a new state when the operation mode of the group of processor cores is “mode off”. In addition, the control processes can be modified for the “mode on” mode and the “mode off” mode accordingly. Therefore, workload can be reduced when the configurations according to the present embodiment are applied to the configurations according to the comparative example.
Although the present embodiment is described as above, the configurations and the processes of the information processing apparatus are not limited to those as described above and various variations may be made to the embodiment described herein within the technical scope of the present invention. For example, in the above embodiment, when the cluster 50 which is Local sends an exclusive data acquisition request to the cluster 60 which is Home, processes are performed according to the comparative example. Namely, the cluster 60 acquires the requested data from the L2 cache 603, sends the data to the cluster 50 and discards the data from the L2 cache 603. The exclusive data acquisition request is a data acquisition request used mainly when a cluster requesting the data updates the data in the cluster. Therefore, when the data is evicted from the cluster 50, the data is sent to the cluster 60 which is Home along with a Write Back request since the data is dirty.
However, in some applications executed in an information processing apparatus, data acquired by a Local cluster using an exclusive data acquisition request may be evicted from the Local cluster without being updated. That is, the data, which is clean, is evicted from the Local cluster. With this in mind, a configuration can be employed such that when a Local cluster sends an exclusive data acquisition request to a Home cluster, the requested data is not discarded from the L2 cache in the Home cluster. However, when an exclusive data acquisition request is generated, the status of use of the requested data is registered as not “Exclusive” but “Shared” in the tag RAM in the Home cluster. Therefore, when the protocol is modified so as to administer data in this manner, transactions between clusters and transactions between a cluster and a main memory do not increase in comparison with the comparative example. Thus, a system architect of an information processing apparatus can arbitrarily employ a configuration in view of the specifications of the information processing apparatus and the types of applications executed in the information processing apparatus.
Additionally, as for switching between “mode on” and “mode off”, the operation mode can be set to “mode on” when an application is executed using a large amount of memory space exceeding the capacity of a main memory in a cluster. Therefore, the operation mode is set to “mode off” when an application is executed using memory space which does not exceed the capacity of the memory in the cluster. Thus, appropriate configurations of memories and L2 caches can be employed flexibly for each application in the information processing apparatus. Moreover, efforts for establishing configurations of memories and L2 caches for each application can be omitted.
Further, when the power supply for the group of processor cores is individually controlled for each cluster, the group of processor cores which is set in the non-operating state when the operation mode is set to “mode on” can be turned off. Therefore, unnecessary electricity consumption can be reduced in the information processing apparatus. It is noted that so-called power gating can be employed to control the power supply to each group of processor cores in the above embodiment.
Moreover, in the above descriptions, a register is employed to set a group of processor cores to operating state or non-operating state. Instead of the configurations of the L2 cache control unit as described in the above embodiment, configurations as illustrated in
<<Computer Readable Recording Medium>>
It is possible to record a program which causes a computer to implement any of the functions described above on a computer readable recording medium. Here, the functions include setting of a register for example. In addition, by causing the computer to read in the program from the recording medium and execute it, the function thereof can be provided. Here, the computer includes clusters and controllers for example.
The computer readable recording medium mentioned herein indicates a recording medium which stores information such as data and a program by an electric, magnetic, optical, mechanical, or chemical operation and allows the stored information to be read from the computer. Of such recording media, those detachable from the computer include, e.g., a flexible disk, a magneto-optical disk, a CD-ROM, a CD-R/W, a DVD, a DAT, an 8-mm tape, and a memory card. Of such recording media, those fixed to the computer include a hard disk and a ROM (Read Only Memory).
An operation processing apparatus, an information processing apparatus and a method of controlling an information processing apparatus according to one embodiment may reduce the access frequency to a main memory.
All example and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2013-062811 | Mar 2013 | JP | national |