This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2009-162388 filed on Jul. 9, 2009, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a cache memory.
Recently, increased operating, frequency of processors has resulted in relatively longer time delay in memory access, thereby affecting performance of the entire system. Many processors are provided with a high speed, small capacity memory which is called a cache memory to disguise time delay during a memory access.
Each cache way 1602 is constituted of a plurality of cache blocks (hereinafter, sometimes referred to as “blocks”) 1603 (#1) to 1603 (#n). For example, n=1024.
Each cache block 1603 is constituted of a validity flag, a tag and a data field. For example, the validity flag occupies 1 bit, the tag 15 bits and the data field 128 bytes.
Data size of the cache memory 1601 is, for example, 512 kilobytes which obtained by a calculation of (data size of cache block×number of cache indices×number of cache ways=128 bytes×1024 indices×four ways), with 1024 bytes being 1 kilobyte.
An address 1605 for a memory access designated by a program is constituted of 32 bits, with the lowest 7 bits occupied by an offset in the cache block, intermediate 10 bits occupied by an index and the highest 15 bits occupied by a tag.
When reading of data with respect to the address 1605 is instructed, a cache set represented by a 10-bit index in the address 1605 is selected. In response to the instruction, each cache block 1603 (#i) of a corresponding index is read out from each of the cache ways 1602 (#1) to 1602 (#4) and then input into each of comparators 1604 (#1) to 1604 (#4).
The comparators 1604 (#1) to 1604 (#4) detect matching and mismatching between the tag value in the read cache block 1603 (#1) and the tag value in the designated address 1605. It is then found that a cache hit has been made in the cache block 1603 (#i) read at the comparator 1604 in which matching between the tag values has been detected among the comparators 1604 (#1) to 1604 (#4). Then, data in a way in which matching between the tag values was detected is read out. In this manner, data may be read out at a speed higher than when read out from a main memory.
When none of the comparators 1604 detected matching between the tag values or when a validity flag represents invalidity of a corresponding cache block even if matching between the tag values is detected, no cache hit is made and data is read out from an address 1605 on the main memory.
When writing of data to the address 1605 is instructed, a cache block 1603 (#i) is designated among the cache blocks 1603 (#1) to 1603 (#n) on the basis of the 10-bit index and a tag matching circuit of the address 1605 in the same manner as in the data readout.
When a cache miss occurs, as illustrated in
Examples of such a way selection algorithm include a Least Recently Used (LRU) algorithm, with which data in the least recently used cache block is selected and replaced (i.e., removed).
As is apparent from the foregoing description, when a wide range of data is to be accessed, a plurality of data pieces may have the same index value of the address 1605, resulting in cache conflict. In the set associative cache memory, however, when the indices designate the same cache set among the cache sets #1 to #n, cache blocks may be selected from a plurality of ways since all the ways are not necessarily in use. In the 4-way cache memory 1601 illustrated in
With a recent improvement in multi-process environments and virtual machine usage environments, a cache memory configured as illustrated in
Several approaches have been proposed in order to address this problem. A first approach is called a modified LRU replacement. In this approach, a shared cache area is divided on a cache way basis and the number of cache blocks used by the process is counted for each of all the processes operated on the system. At the time of cache block replacement, if the counted number of cache blocks does not exceed a designated number, a cache block used by another process is replaced. If there is no cache block that is replaceable in the same set, a cache block is randomly selected as a candidate for replacement.
A second approach is called column caching, in which a cache area shared by processes is divided on a cache way basis. Each process holds a bit vector of a way as a candidate for replacement. When cache way replacement takes place, a candidate for replacement is selected from the way designated by the bit vector.
A third approach is to dynamically divide a shared cache memory. With this approach, a partition ID (PID) is added to the tag of the cache memory (see
A fourth approach is to provide processing unit identification information and a comparator in each of cache ways constituting a set associative cache memory. At the time of cache way replacement, a number of blocks in which processing unit identification information in the set is corresponds to identification information of a unit which made an access is counted. The counted number of blocks and the maximum number of blocks are then compared to each other. If the counted number of blocks exceeds the predetermined maximum number, a way which has processing unit identification information the same as the identification information of the unit which made the access is replaced.
A fifth approach is to provide an attribute memorizing tag in each cache way. The attribute memorizing tag stores information including update time and priority regarding replacement of the way. A candidate for way replacement is selected on the basis of the attribute values.
The approaches described above, however, suffer from the following problems. Regarding the first approach, since the number of cache blocks in use may be recognized correctly for all the processes, it is difficult to mount such a cache memory in multi-process environments.
Regarding the second approach, the bit vectors may be controlled not to overlap one another in multi-process environments. When, for example, both bit vectors of a process A and a process B are ‘1100’ in a 4-way set associative cache memory, both the processes A and B use only two ways, leaving remaining two ways unused. In order to make the best use of a shared cache memory, a process scheduling may be made by an operating system which recognizes the best combinations.
Regarding the third approach, since the PID is fixedly allocated to the tag, the tag information may be collectively rewritten for a dynamic change in a division state. Such an operation requires a high overhead. The PID may not be flexibly allocated to each process or each virtual machine in, for example, multi-process environments.
Regarding the fourth approach, since the processing unit identification information may be used only for determination on a physical processor basis, the number of ways may not be controlled on a logical program basis, i.e., on a process or a virtual machine basis. Moreover, since a comparator for the processing unit identification information is attached to each of the ways, huge hardware capacity will be required to apply this approach to a cache memory with memory capacity of not less than several megabytes. When the number of ways having the same processing unit identification information is smaller than the predetermined number, a replacement way is selected from all the ways that may be stored in the set. Thus, if a way allocated to a certain processing unit continues to be the least recently used data in the LRU control, the number of ways allocated to the certain processing unit may permanently be 1 and thus the maximum number of blocks may not be reached. Moreover, since there is no description about cache hits regarding the fourth approach, hit information may not be reflected in the LRU information.
In the fifth approach, large additional hardware capacity is required since hardware for recording update time for each way and other units, such as a timer, may be included. The fifth approach is a high overhead operation which may not be applied to multi-process environments and requires rewriting of the attribute tag to dynamically change the allocation.
As described above, the first to fifth related art approaches have difficulty in mounting a cache memory due to huge additional hardware capacity and difficulty in an efficient operation in multi-process environments and virtual machine usage environments.
Japanese Laid-open Patent Publication No. 2001-282617 discusses a method of sectioning a shared cache memory. Japanese Laid-open Patent Publication No. 6-149674 and No. 2000-90059 discuss methods of controlling a shared cache memory capable of reducing a generation of cache miss. G. E. Suh, S. Devadas, and L. Rudolph, “A new memory monitoring scheme for memory-aware scheduling and partitioning”, High-Performance Computer Architecture, 2002, Proceedings, Eighth International Symposium on pages 117-128, 2-6 Feb. 2002. discusses the LRU algorism.
According to an aspect of an embodiment, a cache memory for operating in accordance with a multi-way set associative system, the cache memory includes an identification information storage for storing an identification information for identifying a requesting element of a memory access request corresponding to a cache block specified by a received memory access request, a replacement cache block candidate determinator for determining, upon an occurrence of a cache miss corresponding to the memory access request, a candidate of the cache block for replacing, on the basis of the identification information attached to the memory access request and the identification information stored in the identification information storage corresponding to the cache block specified by the memory access request, a replacement cache block selector for selecting a replacement cache block from the candidate, and an identification information updater for updating the identification information stored in the identification information storage corresponding to the cache block selected by the replacement cache block selector to the identification information attached to the memory access request upon the occurrence of the cache miss.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
Referring now to the drawings, embodiments will be described in detail.
A cache memory 101 according to the present embodiment is implemented as a 4-way set associative cache memory. The cache memory 101 is divided into a plurality of cache ways 102 (#1) to 102 (#4). Each cache way 102 is includes a plurality of cache blocks 103 (#1) to 103 (#n) where n=1024, for example.
In the embodiment of
Data size of the cache memory 101 is, for example, 512 kilobytes which obtained by calculating (data size of cache block×number of cache indices×number of cache ways=128 bytes×1024 indices×four ways), where 1024 bytes equals to 1 kilobyte.
An address 105 for a memory access designated by a program is constituted of 32 bits, with the lowest 7 bits occupied by an offset in the cache block, intermediate 10 bits occupied by an index and the highest 15 bits occupied by a tag.
When an access for reading data from or writing data to the address 105 is instructed, one of the block numbers of (#1) to (#n) is designated among the sets designated by a 10-bit index in the address 105.
Then, each designated cache block 103 (#i) is read from each of the cache ways 102 (#1) to 102 (#4), and is input into each of comparators 104(#1) to 104 (#4).
The comparators 104 (#1) to 104 (#4) detect matching and mismatching between the tag value in the read cache block 103 (#i) and the tag value in the designated address 105. It is then found that a cache hit has been made in the cache block 1603 (#i) read at the comparator 104 in which matching between the tag values has been detected among the comparators 104 (#1) to 104 (#4). Then, data is read from and written to the cache block 103 (#i).
When neither of the comparators 104 detected matching between the tag values or when a validity flag represents invalid even if matching between the tag values is detected, no cache hit is made and the address 105 on the main memory is accessed.
When a data access to the address 105 is instructed, a cache set is designated by a 10-bit index in the address 105. Then, tag values are read from each of the cache ways 102 (#1) to 102 (#4) in the tag RAM 202 and are input into the comparators 104 (#1) to 104 (#4).
The comparators 104 (#1) to 104 (#4) detect matching and mismatching between the tag value of the cache block 103 (#1) corresponding to each of the cache ways 102 (#1) to 102 (#4) read from the tag RAM 202 and the tag value stored in the designated address 105. When a cache hit is made, a 4-bit hit way signal in which only an output of the comparator 104 from which matching of the tag values was detected is 1 is output from the four comparators 104.
When a cache hit is made and a memory access request is a readout request, the following read operation will be executed. In the data RAM 201, four data values each corresponding to each of the cache ways 102 (#1) to 102 (#4) is read from a cache set 103 (#1) designated by the 10-bit index in the address 105 to a selector 204. In the core ID-RAM 203, four core ID values each corresponding to each of the cache ways 102 (#1) to 102 (#4) is read from the designated cache set 103 (#i) to a selector 205. The selectors 204 and 205 select and output a data value and a core ID of a cache block 103 (#i) of a cache way corresponding to the comparator 104 at which matching of the tag values was detected on the basis of the hit way signal from a comparator 104 at which matching of the tag values was detected.
When a cache hit is made and a memory access request is a writing request, the following writing operation will be executed with respect to the block number #i designated by a 10-bit index in the address 105 in the data RAM 201. Data and a core ID designated on the basis of the memory access request are written into a cache block 103 (#i) of a cache way designated by a hit way signal among the cache ways 102 (#1) to 102 (#4).
An operation of the thus-configured embodiment of the cache memory will be described below. The cache memory 101 according to the present embodiment may be shared by processors in multi-processor systems. The cache memory 101 according to the present embodiment may also be implemented as a cache memory which may be shared by core processors mounted on a chip in an on-chip multi-processor. The cache memory 101 according to the present embodiment may further be implemented as a cache memory which may be shared by threads in a multi-thread processor. Thus, the cache memory 101 according to the present embodiment may be applied to various computer systems which have a configuration of “requester units” for accessing and a cache memory shared by the “requester units.” In the following description, the “requester unit” for accessing the cache memory 101 will be referred simply to as a core (CORE).
In the present embodiment, as illustrated in
In the embodiment of the cache memory 101 illustrated in
As will be described in detail with reference to
Step 1: In a cache block 103 constituted of four ways in a cache set designated by an access address, the number of cache blocks having core IDs which are the same as that of a core ID of a requester of a memory access is acquired by, for example, counting.
Step 2: The maximum number of blocks in the set corresponding to the core ID of the requester and the number of the cache blocks acquired in Step 1 are compared to each other.
Step 3: If the number of cache blocks acquired in Step 1 is smaller than the maximum number of blocks in the set, a way having a core ID different from that of the requester is selected as a replacement way candidate. If, on the contrary, the selected number of blocks is equal to or larger than the maximum number of blocks, the way having the core ID of the requester is directly selected as a replacement way candidate directly.
Step 4: A way to be replaced, i.e., a replacement way, is selected among the replacement way candidates on the basis of a replacement policy, such as Least Recently Used (LRU).
The control circuit which controls replacement of the cache way merely replaces the core ID in the core ID-RAM 203 (
Such a control operation has the following advantages. First, when there is data shared by other cores, these cores may continuously share the cache memory to thereby minimize capacity loss. Second, since the core ID with which a cache hit is made is recorded, the recorded core ID may be used as a history for the next access. In such history-based accesses, for example, a core corresponding to the core ID may select a replacement way on the basis of SRU among data that have been accessed before.
When a cache hit is made on a way currently allocated to a core having another core ID, the number of cache ways having core IDs same as that of the requester of the memory access may temporarily exceed the maximum number of blocks in the set. This, however, is a trivial matter because the number of blocks in the set allocated to each core ID will be stabilized gradually to each requested number of blocks by other cores accessing and replacing the blocks having core IDs exceeding the maximum number of blocks in the set.
In the present embodiment, the maximum number of blocks in the set may be determined arbitrarily on a core basis. This is because, during operation of the system, cache size may be dynamically increased or decreased to approximate the maximum number of blocks in the set due to the control operation of Step 3 described above even if the maximum number of blocks in the set is determined arbitrarily. Assuming that, for example, as illustrated in
The maximum number of blocks is set for each core ID as described in the foregoing description. It is, however, also possible to group several cores and collectively set a maximum number of blocks for each group as illustrated in
As described above, the number of blocks allocated to a particular core may be restricted with a simple conditional judgment in the embodiment of the cache memory 101 having the configuration illustrated in
Next, a first embodiment of the way replacement control in the cache memory 101 illustrated in
In the exemplary configuration of
Each processor core 601 is connected to the cache memory 602 via a 33-bit address bus 605 and a 32-bit data bus 606. During memory access, each processor core 601 transmits a 1-bit core ID corresponding to its own core on an address bus together with a 32-bit address. When a cache miss occurs, data is acquired from the main memory 604 through the MAC 603 of the processor core 601.
A cache controller 718 (an example of an identification information updater) illustrated in
In
A 1-bit core ID 702 for controlling a replacement status of data in the cache memory 101 is added to the memory access request 708. The memory access request 708 illustrated in
The core ID 702 added to the memory access request 708 is converted into a replacement core ID 717 in the replacement core ID decision circuit 711 and then input into the replacement way selectable mask generation circuit 703.
Two registers (examples of maximum number designators), the maximum number of blocks register 712 for a core 0 and the maximum number of blocks register 713 for a core 1, are connected to the replacement core ID decision circuit 711. Here, “core 0” means the processor core 601 (#0) and “core 1” means the processor core 601 (#1).
Two counting circuits (examples of counters), a bit counting circuit 714 for counting the number of blocks for the core 0 and a bit counting circuit 715 for counting the number of blocks for the core 1, are connected to the replacement core ID decision circuit 711 (an example of a comparator).
In each cache block 103, the maximum number of blocks which may be set to the processor cores 601 (#0) and 601 (#1) is set to the maximum number of blocks registers 712 for the core 0 and 713 for the core 1. These register values may be changed with instructions from the processor core 601 (see
Next, the bit counting circuit 714 sequentially reads, via an inverter 716, 4-bit values of the core ID corresponding to the four ways read from the core ID-RAM 203 and counts the number of “1.” Thus, the bit counting circuit 714 counts the number of the core IDs having a value of “0” among the core IDs of the four ways corresponding to a designated cache block 103 (#i) read from the core ID-RAM 203. The bit counting circuit 715 sequentially reads, 4-bit values of the core ID corresponding to the four ways read from the core ID-RAM 203 and counts the number of values “1.” Thus, the bit counting circuit 715 counts the number of core IDs having a value of “1” among the core IDs of the four ways corresponding to a cache block 103 (#i) read from the core ID-RAM 203.
The registers 712 and 713 and the bit counting circuits 714 and 715 each outputs, for example, a 3-bit data as the maximum number of blocks. The output 3-bit data has a value of 0 to 4, which is the maximum block.
In the replacement core ID decision circuit 711, when a core ID=“0” is designated in the memory access request 708, a selector 711-1 selects an output of the register 712 and a selector 711-2 selects an output of the bit counting circuit 714. The selected outputs are then input into a comparator 711-3. When a core ID=“1” is designated in the memory access request 708, the selector 711-1 selects an output of the register 713 and the selector 711-2 selects an output of the bit counting circuit 715. The selected outputs are then input into the comparator 711-3.
Then, the comparator 711-3 compares the maximum number of blocks corresponding to the core ID value designated by the memory access request 708 and the number of blocks having that core ID value in the designated cache block 103 (#i). Here, the maximum number of blocks is the information used for defining the upper limit of the number of ways, i.e., the number of blocks which correspond to each processor core. As described above, the core ID value is the information used for identification of the processor core 601. Thus, the comparator 711-3 compares the maximum number of blocks set to the processor core 601 which issued the memory access request 708 and the number of blocks used by the processor core 601 which corresponds to the core ID designated by the memory access request in the selected cache block 103 (#i).
In the example of
As a result of the comparison by the comparator 711-3, if the current number of blocks is smaller than the maximum number of blocks regarding the designated core ID, the number of blocks may further be increased with respect to the designated core ID. That is, in the designated cache block 103 (#i), the number of blocks corresponding to the processor core 601 which issued the memory access request 708 may further be increased. Thus, a selector 711-5 outputs a replacement core ID 717 having a value inverted from that of the designated core ID. The selector 711-5 selects a value which is inverted by an inverter 711-4 with respect to the core ID 702 added to the memory access request 708 and then outputs the selected value as a replacement core ID 717.
If, on the other hand, the current number of blocks is equal to or larger than the maximum number of blocks regarding the designated core ID, the number of blocks corresponding to the designated core ID may not be increased any more. That is, in the designated cache block 103 (#i), the number of blocks corresponding to the processor core 601 which issued the memory access request 708 may not be increased any more. The selector 711-5 outputs a replacement core ID 717 to which the designated core ID value is set. That is, the selector 711-5 directly selects a core ID 702 added to the memory access request 708 and outputs the same as a replacement core ID 717.
In the example of
The replacement core ID 717 is input into the replacement way selectable mask generation circuit 703. A 4-bit core ID 701 corresponding to the cache ways 102 (#1) to 102 (#4) read from the core ID-RAM 203 corresponding to the designated cache block 103 (#i) and a replacement core ID 717 are input into the replacement way selectable mask generation circuit 703.
The replacement way mask generation circuit 703 includes an exclusive OR circuit (XOR) 703-1 and an inverter (INV) 703-2 as illustrated in
In the example of
In the replacement way candidate 709, the LRU selection circuit 704 selects any one of ways which corresponds to the bit position with a value of “1” in accordance with a LRU algorithm. The LRU selection circuit 704 outputs a 4-bit replacement way instruction information 710 of which only a bit position corresponding to the selected way is “1.”
In the example of
The selector 705 outputs data corresponding to the memory access request 708 to a way corresponding to the bit position with a value of 1 among the 4-bit data of the replacement way 310 in the data RAM 201. The selector 706 outputs a tag corresponding to the memory access request 708 to a way corresponding to the bit position with a value of 1 among the 4-bit data of the replacement way 310 in the tag RAM 202. The selector 707 outputs a core ID corresponding to the memory access request 708 to a way corresponding to the bit position with a value of 1 among the 4-bit data of the replacement way 310 in the core ID-RAM 203.
An index in the memory access request 708 designates block numbers of the data RAM 201, the tag RAM 202 and the core ID-RAM 203. Then, the data, the tag and the core ID are written in the cache block 103 of the selected way of the designated block number in the data RAM 201, the tag RAM 202 and the core ID-RAM 203. Areas in which the data, the tag and the core ID are written are blacked out in
When the memory access request 708 is a readout request, the data written in the data RAM 201 is the data read from a memory area corresponding to the address 105 on the main memory which is not illustrated. When the memory access request 708 is a writing request, the data written in the data RAM 201 is the data designated in the memory access request 708.
At the time of writing the data, the 4-bit core ID corresponding to the designated block number (#i) in the core ID-RAM 203 is updated from “0001” to “1001” in the example of
A hit way updating circuit 1101 illustrated in
The hit way updating circuit 1101 executes the following updating operation with respect to the 4-bit core ID 701 (see
That is, the core ID value corresponds to a cache way in which a bit value of hit way signal is 1 among 4-bit core ID 701 is updated to the core ID 702 attached to the memory access request 708. And the hit way updating circuit 1101 writes back the new core ID information 1103 acquired as a result of the update in location corresponding to current cache block 103 (#i) in core ID-RAM 203. In the example of
In this manner, the core ID 701 with which a cache hit is made in the current cache block 103 (#i) is updated to match the core ID 702 designated by the memory access request 708 so that no inconsistency occurs during the process.
According to the first embodiment of the replacement way control corresponding to the configurations illustrated in
Next, a second embodiment of the way replacement control in the cache memory 101 illustrated in
The structures also includes context (*context) which is a value to be saved or restored at the time of switching processes under execution. In the present embodiment, information about the maximum number of blocks in the set of the process, i.e., “way_limit_count” is also added to the structure. When the process executed on the processor core 601 is switched to another process, i.e., when a context switch is to be made, the operating system restores the information about the maximum number of blocks in the set to a process execution context before switching from a process execution context after switching together with other register values.
The configuration of
The “way_limit_count” value to be set in the maximum number of blocks register 1001 is defined in the setup of the process structure in the program executed in the processor core 601 (#0) of
As described above, in the second embodiment of the replacement way control, each time the memory access is made from each processor core 601, the maximum number of blocks in the set is set using the process structure illustrated in
Next, a third embodiment of the way replacement control in the cache memory 101 illustrated in
The third embodiment is an exemplary software configuration in which a compiler may issue an instruction on the maximum number of blocks in the set.
As an example of a source code 1201 of the application, a source code of a program in which requesting elements of a sequence a and a sequence b are added and the obtained value is substituted for a sequence c is illustrated in
In the first step (1202 of
In the second step (1204 of
When these execution binaries are executed in each processor core 601 illustrated in
As described above, in the third embodiment of the replacement way control, the compiler allocates the cache memory 101 to each thread to optimize total performance of a computer system.
Next, a fourth embodiment of the way replacement control in the cache memory 101 illustrated in
The configuration of
The cache controller 718 of
In this manner, the number of memory access requests 708 which made cache misses is counted for each processor core 601 which issued the memory access requests 708. Although the dedicated registers 1301 and 1302 are provided in the exemplary configuration of
Each value of the number of cache misses registers 1301 and 1302 in the replacement way control circuit are referred to by a way limit scheduler 1303 mounted on the operating system.
The way limit scheduler 1303 refers to the values in the number of cache misses registers 1301 and 1302 at regular time intervals. The way limit scheduler 1303 then updates the maximum number of blocks of the processor core 601 (#0) allocated to the maximum number of blocks register 712 for the core 0 and the maximum number of blocks of the processor core 601 (#1) allocated to the maximum number of blocks register 713 for the core 1, based on the values of the number of cache misses registers 1301 and 1302. As a result, cache allocation to the processor core 601 (#0) and to the processor core 601 (#1) illustrated in
The values of the number of cache misses registers 1301 and 1302 are cleared after a certain period of time or upon each time a thread executed by each processor core 601 is completed. Thus, in the fourth embodiment of the replacement way control, the number of times of cache misses in the cache memory 101 is dynamically counted on a core ID (i.e., processor core) basis and cache allocation is optimized on the basis of the counted result. With this configuration, the number of cache misses in the entire system may be minimized.
Next, a fifth embodiment of the way replacement control in the cache memory 101 illustrated in
The fifth embodiment is an exemplary system configuration with which a maximum number of blocks may be set to a plurality of virtual machines. In order to mount a virtual machine, hypervisor software (hereafter, referred to as “HV”) is provided between the operating system and actual hardware. The HV manages virtual machine information and allocates an actual core to a virtual machine.
In the fifth embodiment, the value of “way_limit_count” is also saved and returned during restoration. At the time of restoration, the value of restored “way_limit_count” is set to the maximum number of blocks register 1001 or 1002 of
Thus, in the fifth embodiment of the replacement way control, the amount of use of the cache block in the cache memory 101 may be controlled for each virtual machine. As described above, according to the disclosed embodiment, the cache memory area may be arbitrarily divided on a way basis with small additional hardware cost. It is thus possible to optimize performance of the cache memory and to control conflict between processes or virtual machines over the cache memory, thereby improving effective performance of the processor.
The embodiment may be effectively applied to the following fields for the purpose of executing programs at a high speed in highly efficient processors.
Optimization of cache performance by a programmer or a compiler
Optimization of virtual machine performance
Optimization of process scheduling by an operating system
The embodiment may be applied to a cache memory shared by processors in a multi-processor system and to a cache memory shared by core processors mounted on a chip in an on-chip multi-processor. Further, the embodiment may be applied to a cache memory shared by threads in a multi-thread processor.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the embodiment and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a illustrating of the superiority and inferiority of the embodiment. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2009-162388 | Jul 2009 | JP | national |