This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2013-065378, filed Mar. 27, 2013, the entire contents of which are incorporated herein by reference.
Embodiments relate to a multi-core processor and a control method.
In recent years, attention has been paid to non-volatile memories such as MRAM (Magnetic Random Access Memory). Replacing a volatile memory, generally used as a cache memory for a processor, with a non-volatile memory is expected to reduce leakage power and to allow for individual small-scale power shutdowns for inactive processors, thus reducing power consumption.
On the other hand, the non-volatile memory generally involves a longer latency and higher access power than the volatile memory. Because of these characteristics, simple replacement of the volatile memory with the non-volatile memory may disadvantageously lead to degradation of performance or an increase in access power.
According to an embodiment, a multi-core processor is capable of executing a plurality of tasks. The multi-core processor includes at least a first core and a second core. The first core and the second core are capable of accessing a shared memory area. The first core includes one or more memory layers in an access path to the shared memory area, the one or more memory layers including a local memory for the first core. The second core includes one or more memory layers in an access path to the shared memory area, the one or more memory layers including a local memory for the second core. The local memory for the first core and the local memory for the second core include memories with different unit cell configurations in at least one identical memory layer.
In the embodiments described below, examples of a configuration of a multi-core processor are illustrated. Each of the multi-core processors according to the embodiments comprises a plurality of cores provided in one die to execute calculations. The cores can access a shared memory area. Each of the cores comprises at least one memory layer provided in an access path to the shared memory area. The memory layer includes a local memory. In each of the multi-core processors according to the embodiments, at least two local memories in an identical layer comprise memories with different unit cell configurations.
The “core” refers a calculation device that executes a calculation for each instruction. The “instruction” represents a function that defines a type of calculation that can be executed by the core. An “instruction set” represents a group of instructions that can be carried out by the core.
The “shared memory area” is a memory area shared by a plurality of cores and in which different cores can access the same data. For example, a main memory device is a shared memory area.
The “memory layer” refers to a group of memories which can store data from the shared memory area and which are accessed by the core at different speeds. For example, a group of memories comprising a register, an L1 cache, and an L2 cache is a memory layer.
The “memories in the same layer” represents memories at an equal logical distance from the core. For example, in a configuration comprising two cores, a first core and a second core, each of the cores comprising an L1 cache and an L2 cache, the L1 cache for the first core and the L1 cache for the second core are memories in the same layer. The L2 cache for the first core and the L2 cache for the second core are also memories in the same layer. The L1 cache for the first core and the L2 cache for the second core are not memories in the same layer. The L1 cache, the L2 cache, and an L3 cache may be physically different memories or memory areas resulting from logical division of a physical memory.
The “local memory” represents a memory area that a certain core can access faster than the other cores.
The “memories with different unit cell configurations” represents memories some or all of whose memory cells are different from one another in a physical principle for storage of information or in a transistor level circuit, For example, a volatile memory and a non-volatile memory are memories with different unit cell configurations. As a specific example, SRAM and MRAM are a volatile memory and a non-volatile memory, respectively, that is, memories with different unit cell configurations. MRAM and ReRAM (Resistance Random-Access Memory) are both non-volatile memories but have different unit cell configurations. MRAM and PRAM (Phase change RAN) are also both non-volatile memories but have different unit cell configurations. Furthermore, 6-transistor SRAM and 8-transistor SRAM are both SRAMs but have different unit cell configurations. On the other hand, the following are not memories with different unit cell configurations: two memories which are the same in the physical principle for the storage of information and in the transistor level circuit and which are different from each other in capacity, latency, or the like. Similarly, memories different from one another only at a physical level are not memories with different unit cell configurations. For example, 6-transistor SRAMs different from one another only in a manufacturing process utilized are not memories with different unit cell configurations.
[Memory Configuration]
As shown in
The first core 100 and the second core 200 both utilize SRAMs that are volatile memories as the L1 instruction cache (101, 201) and the L1 data cache (201, 202) and utilize MRAM that is a non-volatile memory as the shared L3 cache 400.
Furthermore, the first core 100 utilizes MRAM as the L2 cache 103, and the second core 200 utilizes SRAM as the L2 cache 203. For the first path, a path from the first core 100 to the L3 cache 400 is SRAM (L1 caches 101 and 102)→MRAM (L2 cache 103)→MRAM (L3 cache 400). In contrast, for the second core 200, a path from the second core 200 to the L3 cache 400 is SRAM (L1 caches 201 and 202)→SRAM (L2 cache 203)→MRAM (L3 cache 400). Thus, the first core 100 and the second core 200 have different memory cell configuration.
In the first embodiment, it is assumed that MRAM and SRAM are as an example of memories with different unit cell configurations. However, such different memories are not limited to a combination of MRAM and SRAM. Any combination of memories may be used as long as the memories have different unit cell configurations. The memories and configurations in the layers other than the L2 cache are not limited to the first embodiment. For example, the L1 cache may be of an MRAM instead of an SRAM, and the L3 cache may be of an SRAM instead of an MRAM. Furthermore, a position where the bus is provided is not limited to the position in
For simplification of description,
As shown in
As shown in
Of course, MRAM may be utilized as the tag memory line 105 and the line memory array 106 in the L2 cache 103 for the first core 100. SRAM may be utilized as the tag memory line 205 and the line memory array 206 in the L2 cache 203 for the second core 200.
[Hardware Control Scheme]
A hardware control scheme for the multi-core processor shown in
A control scheme used to reference data in each of the modules providing the multi-core processor shown in
[Software Control Scheme]
A processing management unit 20 shown in
The processing information table 21 is a table in which information on each type of processing is recorded. The core information table 22 is a table in which information on each core is recorded. The interface unit 24 has an input/output function to exchange information with hardware (multi-core processor 10). The scheduler 23 allocates processing to hardware (any one of the cores of the multi-core processor 10) via the interface unit 24 based on information in the processing information table 21 and the core information table 22. Furthermore, the scheduler 23 receives information from the hardware via the interface unit 24 to update the contents of the processing information table 21 and the core information table 22. The processing management unit 20 may be implemented using software. A program for the processing management unit 20 may be executed in the first core 100 or second core 200 in
According to the first embodiment, the type of the local memory for the core is expressed as a character string, which is recorded. However, the type need not necessarily be expressed as a character string and any information may be used which enables the scheduler 23 to identity the characteristics of the core. For example, a specification may be pre-provided such that MRAM corresponds to a value “1” and that SRAM corresponds to a value “2”. In the core information table 22, “1” may be recorded as the local memory recording scheme for the core ID1, and “2” may be recorded as the local memory recording scheme for the core ID2. In the example illustrated in
Several techniques are possible for allocating processing to the cores (scheduling processing for the cores). In the first embodiment, examples of the following will be described: a technique (1) for static scheduling based on pre-execution provision information and two techniques ((2) and (3)) for dynamic scheduling in view of execution efficiency, and a technique (4) that is a combination of the three techniques.
The scheduling technique is not limited to the above-described techniques. For example, the scheduling may be carried out in view of power consumption, the temperature of the processor, or a combination of performance, power consumption, temperature, and the like.
In the multi-core processor in
In general, MRAM involves a longer latency (lower speed) but a larger storage capacity per unit area (hereinafter simply referred to as a “capacity”) than SRAM. On the other hand, SRAM involves a shorter latency (higher speed) but a smaller storage capacity per unit area than SRAM. In other words, when the L2 cache 103 for the first core 100 and the L2 cache 203 for the second core 200 are arranged on the die 10 so as to have the same area, the two types of memories are in a trade-off relation in terms of latency and capacity. Thus, when a certain type of processing is carried out, which core (first core 100 or second core 200) with the corresponding memory has an increased execution efficiency depends on the characteristics of the processing executed. Ideally, the first core 100 is desirably allocated with a type of processing whose execution efficiency is affected by capacity (cache miss) more significantly than latency, and the second core 200 is desirably allocated with a type of processing whose execution efficiency is affected by latency more significantly than capacity.
(1) Allocation Based on Pre-Execution Provision Information
A technique will be described in which, before a program is executed, core allocation information for processing is specified and in which the scheduler 23 allocates processing to the cores in accordance with a processing attribute based on the core allocation information.
In the first embodiment, information on the allocation target core is expressed as a character string. However, the information may be in any form as long as the information allows the scheduler 23 to determine the core to be allocated. For example, a specification is pre-provided such that the processing attribute to be allocated to the core with MRAM as a local memory corresponds to the value “1” and that the processing attribute to be allocated to the core with SRAM as a local memory corresponds to the value “2”. The value “1” may be recorded as the processing attribute of the processing ID x1, and the value “2” may be recorded as the processing attribute of the processing ID x12. Alternatively, instead of these values, core IDs may be recorded.
Any technique for specifying pre-execution provision information on processing may be used as long as the processing management unit 20 can identify information indicating to which core processing is to be allocated. For example, as a possible technique, a programmer provides information while describing a program, and compiles the program to embed the pre-execution provision information in binary data. Furthermore, during the last execution, information on the core to be allocated may be recorded in the processing information table 21. A possible technique for providing information while describing a program involves specifying, as an argument, the processing attribute “MRAM”, indicating that the processing is to be allocated to the core with MRAM as a local memory, for example, as shown in
The scheduler 23 references the processing information table 21 to obtain information indicating the type of the memory (processing attribute) for the core to which the target processing is to be allocated. For example, when allocating the processing ID 0x1, the scheduler 23 determines that the processing is to be allocated to the core with MRAM as a local memory based on the contents of the processing information table 21 in
The scheduler 23 need not necessarily allocate the processing to the cores strictly in accordance with the processing attribute. For example, in the core to which the processing is to be allocated, another processing may be in execution. In such a case, the processing may be allocated to a core not specified in the processing attribute item in view of load balancing.
(2) Processing Allocation Based on Execution Efficiency Information
When, for example, information on processing fails to be provided before the processing is carried out, the processing is allocated based on another certain type of information while the processing is in execution. Here, a technique is illustrated in which the scheduler 23 executes processing allocation based on information on the execution efficiency.
The “execution efficiency” is any information that can express the execution efficiency of processing in a certain core. The first embodiment utilizes, for example, IPC (the number of instructions carried out per clock) as the execution efficiency. The execution efficiency is not limited to the IPC but various indicators may be utilized as the execution efficiency. For example, the information representing the execution efficiency may be an IPS (the number of instructions carried out per second), the number of execution clock cycles, power consumption, or performance per unit power consumption.
In the multi-core processor shown in
First, the scheduler 23 allocates the processing to the core ID1 of the core with MRAM as a local memory. The first core 100, which corresponds to the core ID1, starts carrying out the allocated processing.
The scheduler 23 starts acquiring execution information using a performance counter or the like when a trigger event is generated. When the next trigger event is generated, the scheduler 23 records the value of the IPC in an “IPC in ID1 core” item in the processing information table 21, shown in
When the next trigger event is generated, the scheduler 23 compares the magnitudes of the “IPC in the ID1 core” and the “IPC in the ID2 core”, both recorded in the processing information table 21, and shifts the processing to the core with the larger number. For example, for the processing ID 0x1 in
(3) Allocation Based on Execution Efficiency Decrement Information
Another technique is illustrated in which dynamic processing allocation is carried out while processing is in execution as is the case with the processing allocation based on the IPC information described in “(2) processing allocation based on execution efficiency information”. In such an architecture as shown in
[Example of Initial MRAM Core Allocation]
The dynamic processing allocation (scheduling) carried out when the processing is initially allocated to the first core 100 (the core with MRAM as a local memory) will be described with reference to a flowchart in
First, the scheduler 23 allocates the processing to the first core 100 via the interface unit 24. The first core 100 executes the processing and performs measurement of a latency dependent execution efficiency decrement and measurement of a cache miss dependent execution efficiency decrement (step S1). The latency dependent execution efficiency decrement is the degree of a decrease in the execution efficiency of the core attributed to an amount of time from issuance of a request by the core until data requested by the core is transferred to the core when the data is present in a target memory. The cache miss dependent execution efficiency decrement is the degree of a decrease in the execution efficiency of the core attributed to an amount of time from issuance of the request by the core until the data requested by the core is transferred to the core when the data is not present in a target memory, that is, when a cache miss occurs.
In the first embodiment, the “target memory” is the L2 cache. Furthermore, the “execution efficiency decrement” is a numerical value representing the degree of a decrease in the execution efficiency of the core. The execution efficiency decrement may be, for example, the ratio of the duration of stalling of the core to the total execution duration, the duration of stalling of the core (for example, the actual duration or the number of clock cycles), or the rate of non-utilization of a calculator present in the core. The duration as used herein may be measured in units of time or in units of events in the core such as the number of clock cycles. The most direct technique for obtaining the above-described information is to measure the number of cycles in which the core stalls, using the performance counter or the like. However, when no performance counter with such a function is present, information from any other type of performance counter may be used for approximate calculations. The latency dependent execution efficiency decrement may be calculated, for example, based on the number of hits to the target memory per instruction. The cache miss dependent execution efficiency decrement may be calculated, for example, based on the number of cache misses per instruction.
The information acquired using the above-described technique is obtained by the scheduler 23 from hardware via the interface unit 24. As shown in
When a trigger event is generated, the scheduler 23 determines which of the two execution efficiency decrements, the latency dependent execution efficiency decrement and the cache miss dependent execution efficiency decrement, is larger based on the information resulting from the measurement in step S1 (step S2). Any trigger event may be used as long as the scheduler 23 can detect the trigger event. The trigger event may be, for example, the start/end of a process, the start/end of a thread, an interruption, or execution of a special instruction. The trigger event may be an instruction provided every given time or an instruction of every given number of instructions. The trigger event may be generated at every given number of cycles. In the illustrated example, when a trigger event is generated, the latency dependent execution efficiency decrement and the cache miss dependent execution efficiency decrement are already recorded in the processing information table 21. However, the recording of the latency dependent execution efficiency decrement and the cache miss dependent execution efficiency decrement may be carried out simultaneously with a trigger event or appropriately before the trigger event. Furthermore, the magnitudes of the latency dependent execution efficiency decrement and the cache miss dependent execution efficiency decrement are compared when a trigger event is generated. However, the magnitudes may be recorded when both decrements are recorded in the processing information table 21. For example, when a policy is used in which the cache miss dependent execution efficiency decrement is subtracted from the latency dependent execution efficiency decrement, the scheduler 23 can determine that the cache miss dependent execution efficiency decrement is larger when the result is a negative number and determine that the latency dependent execution efficiency decrement is larger when the result is a positive number.
When the result of the magnitude determination in step S2 shows that the cache miss dependent execution efficiency decrement is larger as is the case with the processing ID 0x1 in
On the other hand, when the result of the magnitude determination in step S2 shows that the latency dependent execution efficiency decrement is larger as is the case with a processing ID 0x40 in
The core change threshold is a parameter for adjusting the easiness with which the processing is shifted between the cores. For example, the core change threshold may be a pre-provided parameter or may be calculated based on an overhead involved in the shift between the cores or the dominance ratio of the latency dependent execution efficiency decrement or the cache miss dependent execution efficiency decrement to the time interval between trigger events. For example, even when the result of the magnitude determination in step S2 shows that the latency dependent execution efficiency decrement is larger as is the case with a processing ID 0x00 in
[Example of Initial SRAM Core Allocation]
Dynamic allocation carried out when the processing is initially allocated to the second core 200 (the core with SRAM as a local memory) will be described in accordance with a flowchart in
First, the scheduler 23 allocates the processing to the second core 200 via the interface unit 24. The second core 200 executes the processing and then performs measurement of the latency dependent execution efficiency decrement and measurement of the cache miss dependent execution efficiency decrement (step S1).
The scheduler 23 records the latency dependent execution efficiency decrement and the cache miss dependent execution efficiency decrement in the processing information table 21 for each ID that can identify processing, as shown in
The scheduler 23 determines which of the two execution efficiency decrements, the latency dependent execution efficiency decrement and the cache miss dependent execution efficiency decrement, is larger based on the information resulting from the measurement in step S1 (step S2).
When the result of the magnitude determination in step S2 shows that the latency dependent execution efficiency decrement is larger as is the case with a processing ID 0x100 in
On the other hand, when the result of the magnitude determination in step S2 shows that the cache miss dependent execution efficiency decrement is larger, as is the case with a processing ID 0x140 in
Even when the result of the magnitude determination in step 32 shows that the cache miss dependent execution efficiency decrement is larger as is the case with a processing ID 0x180 in
The “(3) allocation based on execution efficiency decrement information” may be carried out in a simpler form. The above-described example uses the two pieces of execution efficiency information, the latency dependent execution efficiency decrement and the cache miss dependent execution efficiency decrement, and the thresholds. However, it is possible to perform control using only one of the two pieces of execution efficiency information and the threshold. An example is illustrated below.
For the “example of initial MRAM core allocation”, a scheme is possible in which, for example, only the latency dependent execution efficiency decrement is measured so that, when the measurement is equal to or larger than the threshold, the processing is reallocated to the SRAM core. This control is equivalent to the control scheme in
For the “example of initial SRAM core allocation”, a scheme is possible in which, for example, only the cache miss dependent execution efficiency decrement is measured so that, when the measurement is equal to or larger than the threshold, the processing is reallocated to the MRAM core. This control is equivalent to the control scheme in
When such control is performed, each of the processing information tables in
(4) Processing Allocation Based on Combination
Scheduling based on a combination of (1) to (3) described above may be carried out on the multi-core processor in
(General procedure 1) The scheduling in (3) is carried out, and when the allocation of the processing to the core need not be changed, the local memory of the core carrying out the processing is recorded in the processing information table 21 as a processing attribute. The procedure then proceeds to (General procedure 3) described below. When the allocation of the processing to the core is changed, the procedure proceeds to (General procedure 2).
(General procedure 2) The IPCs of the cores are measured before and after a change in allocation. Based on the results of measurement of the IPCs, the scheduling in (2) is carried out to identify the optimum core. The local memory of the identified optimum core is recorded in the processing information table 21 as a processing attribute.
(General procedure 3) For the second and subsequent executions of the processing, when a processing attribute has been recorded, the scheduling in (1) is carried out based on the processing attribute information.
The details of an algorithm for the scheduling are shown in a flowchart in
The processing information table 21 used in the present example is shown in
Upon starting carrying out the processing, the scheduler 23 checks the processing attribute item in the processing information table 21 in
As illustrated in the example in (3), for the processing 0x1, the core allocation need not be changed, and thus, the shift of the processing is omitted, with the first core 100 continuously carrying out the processing. In this case, “MRAM”, which is indicative of information on the local memory for the first core 100, is recorded in the processing attribute item. Similarly, for the processing 0x80, the core allocation need not be changed. However, the latency dependent execution efficiency decrement is not very large compared to the cache miss dependent execution efficiency decrement, and the processing fails to be determined to be suitable for the first core 100. Thus, no information is recorded in the processing attribute item. The core allocation needs to be changed for the processing 0x40. Thus, the core allocation is changed with no information recorded in the processing attribute item.
For the processing 0x40, the second core 200 starts carrying out the processing after the core allocation is changed. Upon detecting a trigger event, the scheduler 23 measures the IPC of the processing 0x40 during execution in the second core 200, and records the IPC in the processing information table 21 (step S14).
The IPC in the second core 200 is assumed to be 2.2. At the same time, the scheduler 23 compares the magnitudes of the IPC in the ID1 core, 1.5, and the IPC in the ID2 core, 2.2 (step S15). In this example, the IPC in the ID2 core is larger than the IPC in the ID1 core, and thus, the scheduler 23 determines that the core allocation need not be changed. The scheduler 23 records SRAM, which is information indicative of the local memory for the second core 200, as a processing attribute for the processing ID 0x40.
When the processing with the processing ID 0x1 or the processing with the processing ID 0x40 is carried out again, the scheduling in (1) may be used. The scheduler 23 checks the processing attribute item in the processing information table 21 in
After determining the appropriate core using the above-described technique, the scheduler 23 may measure the IPC in the core carrying out the processing each time a trigger event is generated (step S17). The scheduler 23 compares the IPC measured at the time of the last trigger event and the IPC measured at the time of the current trigger event, both of which are recorded in the processing information table 21 (step S18). When the change of the IPC is equal to or larger than the IPC threshold, the scheduler 23 determines that the characteristics of the processing have changed. The scheduler 23 then executes scheduling to select the appropriate core again (the scheduling is carried out in the following order: (3)→(2)→(1)). During the measurement of the IPC, the latency dependent execution efficiency decrement and the cache miss dependent execution efficiency decrement may be continuously measured in preparation for a change in the characteristics of the processing, or the measurement may be resumed after a change in the characteristics of the processing is detected.
The allocation of the processing to the core need not necessarily be carried out strictly in accordance with the policies of the scheduling in (1) to (4) described above. For example, in the core to which the processing is to be allocated in accordance with the scheduling in (1) to (4), another type of processing may be in execution. In such a case, in view of factors such as load balancing, the processing may be allocated to a core other than the core determined in accordance with the scheduling in (1) to (4), or the allocation of the processing to the core may be postponed or halted. Such scheduling may be implemented by combining the scheduling in (1) to (4) with a scheduling technique intended for load balancing.
In the example illustrated in the first embodiment, the heterogeneous memory configuration is applied to the L2 cache. In an example illustrated in a second embodiment, the heterogeneous memory configuration is applied to the L1 cache.
In the second embodiment, MRAM is utilized as each of L1 caches 107 and 108 for a first core 100 provided in a die 30, and SRAM is utilized as each of L1 caches 207 and 208 for a second core 200 provided in the die 30. For the first core 100, a path from the first core 100 to the L3 cache 400 is MRAM (L1 caches 107 and 108)→MRAM (L2 cache 103)→MRAM (L3 cache 400). For the second core 200, a path from the second core 200 to the L3 cache 400 is SRAM (L1 caches 207 and 208)→MRAM (L2 cache 203)→MRAM (L3 cache 400). Thus, the first core 100 and the second core 200 have memory configurations with different unit cell configurations.
As illustrated in
A hardware control method for the multi-core processor according to the present embodiment may be similar to the hardware control method according to the first embodiment. Furthermore, for a software control method, the scheduling in (1) to (4) may be utilized as is the case with the first embodiment. However, the software control method is not limited to these schemes.
In the first embodiment and the second embodiment, the multi-core processor with the uniform cores is illustrated. In a third embodiment, a multi-core processor with nonuniform cores is illustrated.
As shown in
For the first core 500, a path from the first core 500 to the L3 cache 400 is MRAM (L1 caches 501 and 502)→MRAM (L2 cache 503)→MRAM (L3 cache 400). In contrast, for the second core, a path from the second core to the L3 cache 400 is SRAM (L1 caches 601 and 602)→MRAM (L2 cache 603)→MRAM (L3 cache 400), Thus, the first core 500 and the second core 600 have memory configurations with different unit cell configurations.
As illustrated in
A hardware control method for the multi-core processor according to the present embodiment may be similar to the hardware control method according to the first embodiment. Furthermore, for a software control method, the scheduling in (1) to (4) may be utilized as is the case with the first embodiment, However, the software control method is not limited to these schemes.
According to the first to third embodiments, it is assumed that all the cores comprise the same instruction set. A fourth embodiment relates to a multi-core processor comprising a plurality of cores mounted therein and having different instruction sets.
In a configuration shown in
For the first core 700, a path from the first core 700 to the L3 cache 400 is MRAM (L1 caches 701 and 702)→MRAM (L2 cache 703)→MRAM (L3 cache 400). On the other hand, for the second core 800, a path from the second core 800 to the L3 cache 400 is SRAM (L1 caches 801)→MRAM (L2 cache 802)→MRAM (L3 cache 400). Thus, the first core 700 and the second core 800 have memory configurations with different unit cell configurations.
As illustrated in
In other words, “memories with different unit cell configurations” may be used as parts of the memories providing the L1 caches 701 and 702 and 801 for the first and second cores 700 and 800. For example, MRAM may be utilized as the L1 instruction cache 701 for the first core 700, SRAM may be utilized as the L1 data cache 702 for the first core 700, and SRAM may be utilized as the L1 cache 801 for the second core 800. Alternatively, SRAM may be utilized as the L1 instruction cache 701 for the first core 700, MRAM may be utilized as the L1 data cache 702 for the first core 700, and SRAM may be utilized as the L1 cache 801 for the second core.
A hardware control method for the multicore processor according to the present embodiment may be similar to the hardware control method according to the first embodiment. Furthermore, for a software control method, the scheduling in (1) to (4) may be utilized, as in the case with the first embodiment. However, the software control method is not limited to these schemes.
A hybrid cache configuration of the multi-core processor has been described in which non-volatile memories are utilized as local caches for some cores, whereas volatile memories are utilized as local caches for the remaining cores. In a typical example, a multi-core processor is configured such that non-volatile memories such as MRAM are utilized as local memories for a large number of cores, whereas volatile memories such as SRAM are utilized as local memories for some remaining cores. Moreover, as described above, the scheduler, which allocates processing to the cores, selects a memory (local cache) suitable for each type of processing through the allocation of the processing to the cores.
Therefore, the above-described hybrid cache configuration enables the software to select the appropriate memory according to the characteristics of the program. Thus, the processing efficiency of the processor can be improved with a possible increase in hardware design costs and in circuit area suppressed.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions.
Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2013-065378 | Mar 2013 | JP | national |