This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2012-190442, filed on Aug. 30, 2012, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are directed to a processor, an information processing apparatus, and a control method of the processor.
There is an information processing apparatus in which plural CPU (Central Processing Unit) nodes as processors are connected with each other, and a memory as a main storage unit belonging to each CPU node is shared by each of the plural CPU nodes (for example, refer to Patent Documents 1, 2). Hereinafter, a data transfer method between nodes in a ccNUMA (cache coherent Non Uniform Memory Access, distributed shard memory) method made up of a cache control part and so on receiving a load request issued by an arithmetic processing section (CORE part) as illustrated in
In
The cache control part 13 selects one request based on a priority order set in advance, and executes processes corresponding to the selected request. The cache memory part 14 is a secondary cache memory holding data blocks stored at a memory 18 being a main storage area. The cache data management part 15 is a resource of the CPU node 10 being a request source, and performs management of addresses and data relating to writing to a cache memory. The memory management part 16 manages information of the memory 18 being the main storage area managed as a home. The remote management part 17 receives a request from the memory management part 16 of the other CPU node, and transmits a data block when the request is hit at the cache memory of its own CPU node.
When the arithmetic processing section (CORE part) 11 issues the load request to the main storage area, the cache control part 13 judges the CPU node 10 where the memory 18 storing the requested data block belongs based on an address space definition defined by a system. For example, CPU-IDs are assigned to a certain address field in the address space definition, and it is judged which memory 18 of any of the CPU nodes 10 does store the data block based on the CPU-ID. Each data block is managed by unit of a cache line size, and all data blocks of the memory 18 have directory information (header information). Information indicating whether or not the data block is the latest one, information indicating which cache memory of any of the CPU nodes 10 has the data block, and so on are contained in the directory information.
A data transfer path at an information processing apparatus illustrated in
In the transfer path at the above-stated information processing apparatus, the memory 18 or the remote management part 17 transmits a data to the memory management part 16, and the memory management part 16 transmits the data to the cache data management part 15, and therefore, latency relating to data transfer becomes long, and it is wasteful. Besides, the resources of the memory management part 16 for the cache data management part 15 and the data are necessary because the data of the memory 18 is transmitted also to the memory management part 16 within the same CPU node 10.
An aspect of a processor includes: a cache memory; an arithmetic processing section that issues a load request loading an object data stored at a main storage unit to the cache memory; a control part that performs a process corresponding to the load request received from the arithmetic processing section; a memory management part that requests the object data corresponding to the request from the control part and header information containing information indicating whether or not the object data is a latest for the main storage unit, and receives the header information responded by the main storage unit based on the request for the main storage unit; and a data management part that manages a write control of the data acquired by the load request to the cache memory, and receives the object data responded by the main storage unit based on the request for the main storage unit.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Preferred embodiment will be explained with reference to accompanying drawings.
A configuration of an information processing apparatus according to an embodiment is as same as the information processing apparatus illustrated in
The cache control part 13 selects one request based on a priority order set in advance, and performs processes corresponding to the selected request. The cache memory part 14 is the secondary cache memory, and holds data blocks stored at a memory 18 being a main storage area. The cache data management part 15 performs a management of addresses and data relating to writing to the cache memory including the cache memory part 14. The memory management part 16 manages information of the memory 18 being the main storage area managed as a home. The remote management part 17 receives a request from the memory management part 16 of the other CPU node, and transmits a data block when the cache memory of its own CPU node is hit for the request.
In data transfers illustrated in
A cache control part 13B and so on request the data existing at the cache memory of the CPU-B node 10B to a remote management part 17B (R24, R25). A data D21 transmitted from the remote management part 17B to the CPU-A node 10A is received by the cache data management part 15A without being intervened by the memory management part 16A as a response for the request. Header information 122 containing directory information transmitted from the remote management part 17B to the CPU-A node 10A is transmitted to the cache data management part 15A via the memory management part 16A (I24). The cache data management part 15A transmits a data D22 to the cache control part 13A.
In the present embodiment, the memory 18 or the remote management part 17 transmits the data to the cache data management part 15 without being intervened by the memory management part 16 as illustrated in
A configuration example of the cache data management part according to the present embodiment for the data transfer by the data transfer paths illustrated in
The cache data management part 15 includes a header management part 22, a data part 23, a select circuit 24, and a data path control part 25. The data from the memory 18 (the memory management part 16 of the other CPU node) and the remote management part 17 are constantly transmitted for the cache data management part 15, and a write timing thereof is controlled by an ID.
The write timing by the ID is described with reference to
Next, the memory management part 16 transmits the cache data management part ID and the memory management part ID to the memory 18 (S12). The memory 18 transmits the cache data management part ID and the memory management part ID to the memory management part 16 (S13) for the above-stated operation, and the memory management part 16 transmits the cache data management part ID and the memory management part ID to the cache data management part 15 (S14). Besides, when a latest data exists at the other CPU node, the memory management part 16 transmits the cache data management part ID and the memory management part ID to the remote management part 17 of the other CPU node (S15) after the memory management part 16 receives the cache data management part ID and the memory management part ID from the memory 18. The remote management part 17 transmits the cache data management part ID and the memory management part ID to the memory management part 16 and the cache data management part 15 for the above-stated operation (S16, S17).
A timing of the ID transmitted from the memory 18 and a timing of the ID transmitted from the remote management part 17 are different as stated above, and therefore, the write timing of the data to the cache data management part 15 is controlled by the ID. At the cache data management part 15, a data from the memory 18 (the memory management part 16 of the other CPU node) or the remote management part 17 is received by a two-port write processing part 21B for an entry indicated by the ID, and performs the writing to the data part 23. Besides, at the cache data management part 15, header information from the memory management part 16 or the remote management part 17 is received by a two-port write processing part 21A for an entry indicated by the ID, and performs the writing to the header management part 22.
In the writing of data according to the present embodiment, the writing to the cache data management part 15 is instructed by two flags D and d contained in a header of a data illustrated in
Here, when the data is a valid latest data, a completion of transfer is notified. Accordingly, for example, the writing of the latest data held by the data part 23 of the cache data management part 15 to the cache memory is performed with reference to the flags D, R, M of the header information held at the header management part 22 in the present embodiment. The flag D indicates that the data is held, the flag R indicates that the resource is secured at the memory management part 16, a completion response is transmitted from the remote management part 17, and a process completion of the memory management part 16 is indicated to the cache data management part 15, and the flag M indicates a response from the remote management part 17. Correspondences between values of the flags D, R, M and states thereof are illustrated in
The cache data management part 15 judges the states of the flags D, R, M by the select circuit 24, sets the transmitted data as the latest data and represents a state in which a data valid indication is received when (D, R, M)=(1, 0, 0) or (1, 1, 1). Here, (D, R, M)=(1, 0, 0) represents a valid latest data from the memory 18, and (D, R, M)=(1, 1, 1) represents a valid latest data from the remote management part 17. These flags D, R, M are provided, and thereby, it is possible to discriminate the latest data from the memory 18 and the latest data from the remote management part 17, and to write to the cache memory. This data valid indication state and a request instruction from the cache control part 13 are transmitted to the data path control part 25, and a data is written from the data part 23 of the cache data management part 15 to the cache memory.
A flowchart of operations from the read request to the write to the cache memory while focusing on the flags is illustrated in
When the read request is issued, the cache control part 13A of the CPU-A node 10A judges whether or not L==H (S101). Here, L H indicates that a requested data is stored at the memory 18 belonging to its own CPU node. Namely, the cache control part 13A judges whether or not the requested data is stored at the memory 18A at the step S101. When L==H as a result of the judgment at the step S101, a resource of the memory management part 16 of the CPU-A node 10A is secured (S102), and a directory at the memory 18A is checked (S103). The flag is set at d=1, and the data is transmitted from the memory 18 to the cache data management part 15A (S104).
Next, the memory management part 16A judges whether or not the latest data exists at the memory 18A based on directory information contained in header information (S105). When it is judged that the latest data exists at the memory 18A as a result of the judgment at the step S105, the memory management part 16A sets the flags at (D, R, M)=(1, 0, 0), and transmits the header information (S106). The cache data management part 15A judges that the flags of the header information are (D, R, M)=(1, 0, 0) by the select circuit 24 (S107), and performs the writing to the cache memory.
When it is judged that the latest data does not exist at the memory 18A as the result of the judgment at the step S105, the data is transmitted from the remote management part 17B (17C) of the CPU node other than the CPU-A node 10A to the cache data management part 15A while setting the flags at D=1, M=1 (S108). Next, a completion response is issued from the remote management part 17B (17C) to the memory management part 16A of the CPU-A node 10A, and the resource is released (S109). The memory management part 16A sets R=1, and the data is transmitted to the cache data management part 15A (S110). The cache data management part 15A judges that the flags of the header information are (D, R, M)=(1, 1, 1) by the select circuit 24 (S111), and performs the writing to the cache memory.
When L==H is not true as the result of the judgment at the step S101, the process goes to step S112. Here, the requested data is not stored at the memory 18A but stored at the memory 18B belonging to the CPU-B node 10B. At the step S112, a resource of the memory management part 16 of the CPU-B node 10B is secured (S112), and a directory at the memory 18B is checked (S113). The memory management part 16B judges whether or not the latest data exists at the memory 18B based on directory information contained in header information (S114). When it is judged that the latest data exists at the memory 18B as a result of the judgment at the step S114, the memory management part 16B sets the flags (D, R, M)=(1, 0, 0), and transmits the header information (S115). The cache data management part 15A judges that the flags of the header information are (D, R, M)=(1, 0, 0) by the select circuit 24 (S116), and performs the writing to the cache memory.
When it is judged that the latest data does not exist at the memory 18B as the result of the judgment at the step S114, the data is transmitted from the remote management part 17C of a CPU-C node 10C to the cache data management part 15A while setting the flags D=1, M=1 (S117). Next, a completion response is issued from the remote management part 17C to the memory management part 16A of the CPU-A node 10A, and the resource is released (S118). The memory management part 16A sets R=1, and the data is transmitted to the cache data management part 15A (S119). The cache data management part 15A judges that the flags of the header information are (D, R, M)=(1, 1, 1) by the select circuit 24 (S120), and performs the writing to the cache memory.
In the present embodiment, it is possible for the memory management part 16 to omit a data storage part 32 as for the request when the request source CPU node (CPU (L)) and the CPU node which has the data (CPU (H)) are the same (L==H). In the ccNUMA method, it is possible to share a vast main storage area by a number of CPU nodes, but it is preferable to tune software such that a local main storage area belonging to its own CPU node is to be accessed to enough increase processing performance. An OS (operation system) actually supporting the ccNUMA configuration and a development environment mount a function called as an MPO (Memory Placement Optimization), and it is programmed to access to the local main storage area.
There is a data base processing software as a usage in which an access ratio to a remote memory not belonging to its own CPU node is large, but a local request ratio: a remote request ratio is statistically approximately 1:1. Accordingly, there is no problem if the local request ratio: the remote request ratio is assumed to be 1:1 or the local request ratio is higher than the above when the general ccNUMA configuration is used. In the request when the request source CPU node (CPU (L)) and the CPU node having the data (CPU (H)) are the same, the data transfer is performed to the cache data management part 15 without being intervened by the data resource of the memory management part 16 by applying the technology of the present embodiment. Accordingly, the request when the request source CPU node (CPU (L)) and the CPU node having the data (CPU (H)) are the same does not use the data resource of the memory management part 16. On the other hand, the request when the request source CPU node (CPU (L)) and the CPU node having the data (CPU (H)) are not the same is intervened by the data resource of the memory management part 16.
A configuration example of the memory management part in the present embodiment is illustrated in
The memory management part in the present embodiment can be made up of an entry including both the header management part and the data part (the entry with the data part) and an entry including only the header management part (the entry without the data part) as stated above. It is controlled such that the request when the request source CPU node (CPU (L)) and the CPU node having the data (CPU (H)) are the same is allocated to the entry without the data part, and the request when they are not the same is allocated to the entry with the data part. Further, the request when the request source CPU node (CPU (L)) and the CPU node having the data (CPU (H)) are the same may be allocated to the entry with the data part when there is no vacant entry in the entries without the data part at the memory management part 16.
A flowchart as for a resource acquisition is illustrated in
The cache control part 13 judges whether or not the read request is the read request L-REQ from inside the request source CPU node (S201). As a result, when the read request is not the L-REQ, the cache control part 13 acquires a resource of the entry with the data part at the memory management part (S202). On the other hand, when the read request is the L-REQ, the cache control part 13 decodes the address, and judges whether or not the request source CPU node (CPU (L)) and the CPU node having the latest data (CPU (H)) are the same (S203). When the request source CPU node (CPU (L)) and the CPU node having the latest data (CPU (H)) are not the same, the cache control part 13 acquires a data resource of the cache data management part (S207).
When the read request is the L-REQ and the request source CPU node (CPU (L)) and the CPU node having the latest data (CPU (H)) are the same, the cache control part 13 judges whether or not the entry without the data part is vacant at the memory management part (S204). When the entry without the data part is vacant at the memory management part, the cache control part 13 acquires the data resource of the cache data management part and the resource of the entry without the data part of the memory management part (S205). On the other hand, when the entry without the data part is not vacant at the memory management part, the cache control part 13 acquires the resource of the entry when the entry with the data part of the memory management part is vacant (S206).
Optimum values of a ratio of the entry with the data part and the entry without the data part are different depending on usages thereof, but it is possible to reduce a CPU chip area and power consumption without lowering the performance in major part of the processes when the ratio is set to be approximately 1:1 in which a general remote request ratio becomes the maximum.
Flows of the data transfers at the data transfer paths illustrated in
The cache control part 13A transmits the load request R31 to the cache data management part 15A, and a resource at the cache data management part 15A is secured. The cache data management part 15A transmits a load request R32 to the CPU-B node 10B, and it is received by the memory management part 16B via the cache control part 13B. The memory management part 16B of the CPU-B node 10B requests a data and directory information to the memory 18B (R33). The memory management part 16B receives header information I31 and information R33 in which the latest data exists at the other CPU-C node 10C from the memory 18B as a response for the request.
A cache control part 13C and so on requests the data existing at a cache memory of the CPU-C node 10C to a remote management part 17C (R35, R36). Header information 132 transmitted from the remote management part 17C by the above-stated request is transmitted to the cache data management part 15A of the CPU-A node 10A via the memory management part 16B of the CPU-B node 10B (I34). A data D31 transmitted from the remote management part 17C is transmitted to the cache data management part 15A of the CPU-A node 10A, and a data D32 is transmitted to the memory management part 16B of the CPU-B node 10B. The cache data management part 15A transmits a data D33 to the cache control part 13A. A flow of the data transfer at the data transfer path illustrated in
In an information processing apparatus in which plural CPU nodes are connected with each other, a requested data is transmitted from a memory to a data management part held by the CPU node without being intervened by a memory management part held by the CPU node, and thereby, it is possible to make latency relating to data transfer short in the information processing apparatus where the plural CPU nodes are connected with each other.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2012-190442 | Aug 2012 | JP | national |