The disclosure of Japanese Patent Application No. 2018-117087 filed on Jun. 20, 2018 including the specification, drawings and abstract is incorporated herein by reference in its entirety.
The present invention relates to a semiconductor integrated circuit, a central processing unit (CPU) allocation method, and a program, and relates to, for example a semiconductor integrated circuit including a plurality of CPUs.
In order to cause a processor to execute high performance processing with low-power consumption, a multiprocessor system in which a plurality of processors is operated in parallel has been proposed. Multiprocessor systems include symmetric multiple processor (SMP) systems and asymmetric multiple processor (ASMP) systems. The SMP is composed of a plurality of processors having the same performance. On the other hand, the ASMP is composed of combinations of a plurality of processors having different performances.
For example, Japanese unexamined Patent Application No. 2006-338184 discloses an example of an SMP. “ARM Limited, Big. LITTLE Technology: The Future of Mobile, [online], Internet <URL: https://www.arm.com/files/pdf/big_LITTLE_Technology_the_Futue_of_Mobile.pdf>, the retrieval date: Feb. 23, 2018” and International Publication No. 2014/188561 disclose examples of an ASMP. “ARM Limited, Big. LITTLE Technology: The Future of Mobile, [online], Internet <URL: https://www.arm.com/files/pdf/big_LITTLE_Technology_the_Futue_of_Mobile.pdf>, the retrieval date: Feb. 23, 2018” proposes a technique (big.LITTLE) in which a CPU core (big) having high performance and a CPU core (LITTLE) consuming low power are combined. The big.LITTLE is a technique for realizing both high performance and low-power consumptions in battery-driven terminal devices such as mobiles. The big.LITTLE dynamically allocates a CPU to process a task between a big and a LITTLE according to the CPU loading of the task. As a result, an optimum CPU is allocated to each task to be processed.
International Publication No. 2014/188561 discloses a multi-CPU system having definition information defining a plurality of forms of combinations of types and numbers of CPUs. In the definition information, a plurality of forms is defined such that the maximum value of the overall data processing performance and the power consumption differs in multiple stages. The multi-CPU system allocates the data processing to the CPU specified in the form selected from the definition information in accordance with the environment of the data processing.
For example, a big. LITTLE configuration as disclosed in “ARM Limited, Big. LITTLE Technology: The Future of Mobile, [online], Internet <URL: https://www.arm.com/files/pdf/big_LITTLE_Technology_the_Futue_of_Mobile.pdf>, the retrieval date: Feb. 23, 2018” may be required to meet both performance and power consumption requirements. In a typical big. LITTLE, either big CPU or LITTLE CPU is allocated based only on the CPUs loads. Despite the high load of the entire system, including the CPU and associated functional blocks, tasks with low CPU load are allocated low-performance CPUs. If the CPU that operates the task is determined based only on the CPU load, the required processing performance cannot be achieved.
Other objects and new features will be apparent from the descriptions of the present specification and the accompanying drawings.
A semiconductor integrated circuit according to an embodiment includes a plurality of CPUs. Each of CPUs has a different performance respectively. The semiconductor integrated circuit determines an effective CPU allocated to a task realized by at least one of the plurality of functional blocks according to definition information defining a relationship between the plurality of functional blocks and any one of the plurality of CPUs.
A CPU allocation method according to an embodiment is a method of allocating a CPU in a semiconductor integrated circuit including a plurality of CPUs and a plurality of functional blocks. Each of CPUs has a different performance respectively. The method comprises determining an effective CPU allocated to a task realized by at least one of the plurality of functional blocks according to definition information defining a relationship between the plurality of functional blocks and any one of the plurality of CPUs.
A program according to an embodiment is a program executed by a semiconductor integrated circuit including a plurality of CPUs and a plurality of functional blocks. Each of CPUs has a different performance respectively. The program causes at least one of the plurality of CPUs to execute a step of determining an effective CPU allocated to a task realized by at least one of the plurality of functional blocks according to definition information defining a relationship between the plurality of functional blocks and any one of the plurality of CPUs.
According to one embodiment, it is possible to maintain or improve the performance of a system composed of a plurality of CPUs having different performances.
Hereinafter, each embodiment will be described in detail with reference to the accompanying drawings. The same or corresponding portions are denoted by the same reference numerals, and description thereof will not be repeated.
The semiconductor integrated circuit 100 is connected to a peripheral device 101 via a bus 102. The peripheral device 101 may include, for example, a display 101A, a universal serial bus (USB) device 101B, an SD card 101C, a communication device, such as an I2C IC 101D, and the like.
The semiconductor integrated circuit 100 includes a first CPU group (big CPU) 8 having high data processing performance and high-power consumption, and a second CPU group (LITTLE CPU) 9 having low data processing performance and low-power consumption. The first CPU group 8 includes CPUs 8A, 8B, 8C, and 8D denoted by “CPU0”, “CPU1”, “CPU2”, and “CPU3”, respectively. Each of CPUs 8A, 8B, 8C, and 8D is a high-performance CPU. The second CPU group 9 includes CPUs 9A, 9B, 9C, and 9D denoted by “CPU0”, “CPU1”, “CPU2”, and “CPU3”, respectively. Each of CPUs 9A, 9B, 9C, and 9D is a low-performance CPU. The number of CPUs included in each CPU group is not particularly limited.
The CPUs 8A to 8D of the first CPU group 8 and the CPUs 9A to 9D of the second CPU group 9 are connected to a memory 11, an input/output (I/O) circuit 12, and a functional block group 13 via a bus 10. Further, the CPUs 8A to 8D of the first CPU group 8 and the CPUs 9A to 9D of the second CPU group 9 are connected to a clock pulse generator 6 via an internal clock bus 14.
The input/output interface circuit 12 is connected to the peripheral device 101 via the bus 102. The functional block group 13 includes a plurality of functional blocks for realizing various functions. In this specification, a “functional block” is constituted by a circuit, i.e., hardware. Hereinafter, the name “device” refers to a functional block. The functional block group 13 may include, but is not limited to, a video coding processor (VCP) 113A, a graphic processing unit (GPU) 113B, a USB controller 113C, an SD host controller 113D, an I2C controller 113E, a clock controller 113F, and the like. Each functional block of the functional block group 13 is connected to the clock pulse generator 6 via a peripheral clock bus 15.
The clock pulse generator 6 receives a source clock from a crystal oscillator 105 and generates an internal clock and a peripheral clock. The clock pulse generator 6 supplies the internal clock to the first CPU group 8 and the second CPU group 9 via an internal clock bus 14. Further, the clock pulse generator 6 supplies the peripheral clock to each functional block of the functional block group 13 via the peripheral clock bus 15.
A power supply 106 supplies a power supply voltage to the semiconductor integrated circuit 100. When a task transitions to a RUNNING state, the functional blocks for realizing the task are powered on. Further, the peripheral clock is supplied to the functional blocks. As a result, the functional blocks are put into an operating state. The clock controller 113F manages supply of a clock to each functional block. Therefore, the clock controller 113F can grasp an on state and an off state of each functional block.
The first CPU group 8 and the second CPU group 9 execute programs. A “program” is an operating system (OS) or an application program. Each CPU executes the operating system and the application program.
As described above, the hardware layer 121 includes the first CPU group 8, the second CPU group 9, and the functional block group 13. The functional block group 13 includes the clock controller 113F that manages a clock supply state (clock ON/OFF) to each functional block.
The software layer 122 is an operating system running on the CPU. One example of the operating system is Linux (registered trademark). The software layer 122 implements functions of a scheduler 4 and a device driver 5. The scheduler 4 is a function used for task management, and performs scheduling or dispatching for allocating tasks to CPUs. The device driver 5 controls each functional block included in the functional block group 13. Further, the device driver 5 receives information on the supply of the clock to each functional block from the clock controller 113F, and detects whether the functional block is in the on state or the off state. The device driver exists for each functional block. For simplicity of illustration, a plurality of device drivers is collectively referred to as the device driver 5 in
The user space layer 123 is an application program that runs on the CPU. A governor 1 determines to which of big CPU (CPUs 8A to 8D) and Little CPU (CPUs 9A to 9D) the tasks are to be allocated. That is, the governor 1 functions as a CPU allocation determination unit.
A device table 2 holds definition information defining correspondence relationships between a plurality of functional blocks (devices) and a plurality of CPUs. That is, the device table 2 has information on the effective CPU allocated to the operation of the task (effective CPU information). The CPU allocated to each of the plurality of functional blocks is defined in advance according to a processing time of the functional block.
The definition information included in the device table 2 is defined in advance by, for example, performing profiling of the system. The device table 2 is stored in, for example, a nonvolatile memory. The device table 2 is called from the nonvolatile memory by the CPU when the program is executed, and is temporarily stored in a volatile memory inside the CPU.
The governor 1 includes an effective CPU specifying unit 1A and an effective CPU calculating unit 1B. The effective CPU specifying unit 1A receives information on the status (on or off) of each functional block from the device driver 5, and refers to the device table 2 to determine a CPU group to which a task in an execution state is to be allocated from the first CPU group 8 and the second CPU group 9. Alternatively, the effective CPU specifying unit 1A determines the CPU group to which the task in the execution state is to be allocated from the first CPU group 8 and the second CPU group 9 based on the processing time of the CPU calculated by the effective CPU calculating unit 1B. The effective CPU specifying unit 1A inputs the determined CPU group to the scheduler 4. As a result, the task in the execution state is operated by the CPU of the specified CPU group.
The effective CPU calculating unit 1B calculates the processing time or usage rate of each CPU of the first CPU group 8 and the second CPU group 9. For example, the effective CPU calculating unit 1B acquires information on the CPU load from the scheduler 4 and calculates the processing time or the usage rate of the CPU. The scheduler 4 generates information about the loading of the big CPU or Little CPU and sends the information to the governor 1. The information on the load of the CPU is the processing time of the CPU or the usage rate of the CPU described above.
The governor 1 and the scheduler 4 may be executed in either the first CPU group 8 (big CPU) or the second CPU group 9 (Little CPU). Depending on the loading conditions of these CPUs, it may be decided whether to execute the governor 1 or the scheduler 4 in the big CPU or the Little CPU. The device driver 5 may be executed by the same CPU as the CPU allocated to the task.
The governor 1 further refers to the device table 2. When the functional block in the operation state is a functional block registered in the device table 2, the effective CPU information is obtained from the device table 2. Based on the effective CPU information, the governor 1 determines a CPU (effective CPU) to be allocated to a task in the execution state. As shown in
When a plurality of tasks is in the execution state but all of the functional blocks registered in the device table 2 is in the non-operation state (i.e., when a functional block not registered in the device table 2 is in operation), the CPU allocation determination unit (governor 1) determines the effective CPU based on the CPU load. For example, the governor 1 determines the effective CPU based on the CPU processing time. When the CPU processing time is long (the CPU load is large), a big CPU is allocated, and when the CPU processing time is short (the CPU load is small), a LITTLE CPU is allocated. Instead of the CPU processing time, the CPU usage rate can be used to determine the effective CPU.
In the case of a symmetric multiple processor (SMP) system, since the system is composed of a plurality of CPUs of the same type, the processing performance of the CPUs is basically the same. Therefore, in the SMP, even if the effective CPU is determined based only on the CPU processing time, the problem of the performance degradation does not remarkably appear. On the other hand, in the present embodiment, an asymmetric multiple processor system (ASMP) is applied. In the ASMP, when the effective CPU is determined based only on the CPU load (e.g., CPU processing time), there is a possibility that the control is executed so that the performance is degraded even when the performance of the system is to be maintained or improved depending on the operating use case.
It is assumed that scheduling is executed so that CPUs are allocated based only on the CPU load. In this instance, task 1 is allocated a big CPU and task 2 is allocated a LITTLE CPU. However, the use case does not consist only of the CPU, but rather of the entire system including the CPU and the functional blocks associated with the CPU. Therefore, the load of the entire system consists of the load of the CPU and the load of the hardware (HW). In order to realize the performance (throughput/low latency) required by the application, it is necessary to select an effective CPU in consideration of not only the CPU processing time but also the load of the functional block (for example, the processing time of the functional block). For example, executing Tasks 2 in the LITTLE CPU may degrade the performance of the systems.
Since the processing time is shortened by operating the tasks by the big CPU, the processing time of the GPU can be secured. In this case, since the total processing time is shortened, a higher frame rate can be realized. On the other hand, by operating tasks by the LITTLE CPU, the processing time is lengthened. In this case, it is conceivable that the processing does not end within a certain period of time. Therefore, the entire processing time becomes longer. That is, the frame rate decreases. In the combination of the big CPU and the GPU, the frame rate is, for example, 60 fps (drawing time of one frame is 16.6 ms). On the other hand, in the combination of the LITTLE CPU and the GPU, the frame rate is, for example, 50 fps (drawing time of one frame is 20.0 ms).
In this embodiment, the effective CPU is determined based on the definition information stored in the device table 2. The definition information is information on the correspondence relationship between the functional block and the effective CPU. This correspondence relationship is set in consideration of not only the CPU processing time of the task but also the processing time of the functional block. The effective CPU allocated to the task is determined according to the definition information. Thus, in a system composed of a plurality of CPUs having different performances, it is possible to prevent deterioration of the performance of the system.
Referring to
In step S2, the governor 1 executes table determination processing. The governor 1 determines whether or not each of the plurality of functional blocks in the operation state is a functional block specified in the device table 2. When at least one functional block specified in the device table 2 is included in the plurality of functional blocks in the operation state (step S2: “specified”), the governor 1 executes effective CPU determination processing in step S3.
In the determination processing of step S3, when at least one functional block associated with the big CPU is included in the device table 2 among the plurality of functional blocks in the operation state, the governor 1 determines that the effective CPU allocated to the task is the big CPU. In step S4, the governor 1 sets the big CPU to the effective CPU. The governor 1 inputs the effective CPU (big CPU) to the scheduler 4. On the other hand, if a functional block registered in the device table 2 is included in the plurality of functional blocks in the operation state, but the big CPU is not associated with the registered functional block (in other words, the LITTLE CPU is associated with the registered functional block), the governor 1 determines that the effective CPU allocated to the task is the LITTLE CPU. In step S5, the governor 1 sets the LITTLE CPU to the effective CPU. Similarly to step S4, the governor 1 inputs the effective CPU (LITTLE CPU) to the scheduler 4. When the effective CPU is input to the scheduler 4 in steps S4 or S5, the allocation processing of the effective CPU ends. Thereafter, software processing for the operation of the task is executed.
When a plurality of tasks is transitioning to the execution state, the timings for determining the effective CPUs of the tasks are substantially the same. When at least one functional block associated with the big CPU is included in the plurality of functional blocks in the operation state, the effective CPUs are set so that all tasks are executed by the big CPU. Referring to
When none of the plurality of functional blocks in the operation state is specified in the device table 2 (step S2: “not specified”), the governor 1 determines the effective CPU allocated to the task realized by the functional block in accordance with information on the CPU load for operating the task. “Not specified” in step S2 corresponds to a case where all the functional blocks registered in the device table 2 are in the non-operation state. In detail, the governor 1 acquires a CPU processing time based on the scheduling by the scheduler 4 (step S6). Next, the CPU processing time is determined (step S7). The processing of steps S6 and S7 are executed by the effective CPU calculating unit 1B.
In step S7, the effective CPU calculating unit 1B determines whether or not the CPU processing time is longer than a preset reference time. If it is determined that the CPU processing time is longer than the reference time, the processing proceeds to step S4. Therefore, the big CPU is set to the effective CPU. On the other hand, when it is determined that the CPU processing time is shorter than the reference time, the processing proceeds to step S5, and the LITTLE CPU is set to the effective CPU. Since the effective CPU is set according to the load of the CPU, when high processing performance is required, the performance required by operating the task by the high-performance CPU can be satisfied. On the other hand, when high processing performance is not necessarily required, power consumption can be reduced by operating the task by the low-performance CPU.
In the processing of step S7, the CPU usage rate may be used instead of the CPU processing time. In this case, the CPU usage rate is compared to a reference usage rate. If the CPU usage rate exceeds the reference usage rate, the big CPU is set to the effective CPU in step S4. If the CPU usage rate is less than the reference usage rate, the LITTLE CPU is set to the effective CPU in step S5.
The processing proceeds from step S7 to step S4 or step S5, and when the effective CPU is input to the scheduler 4, the allocation processing of the effective CPU ends. Thereafter, software processing for the operation of the task is executed.
If the functional block associated with the task is unknown (e.g., if an entirely new task transitions to the execution state), governor 1 has no information about the effective CPU. In this instance, the governor 1 sets the big CPU to the effective CPU. As a result, the performance of the system can be maintained. When the functional block is unknown, the governor 1 may set the effective CPU according to the operation state after the functional block is known.
In the first embodiment, the functional blocks and the effective CPUs are associated with each other by the device table 2. The governor 1 determines the effective CPU by referring to the device table 2. When a high processing performance is required for the system, i.e., a combination of a functional block and an effective CPU, the functional block and the high-performance CPU are associated with each other in the device table 2. Even if the task is a task with low CPU load, a high-performance CPU can be allocated to the task according to the device table 2. In particular, when at least one functional block associated with the big CPU is included in the plurality of functional blocks in the operation state, the effective CPUs are set so that all the tasks in operation are executed collectively by the big CPU. As a result, the processing performance of the system can be maintained or improved. On the other hand, when all the plurality of functional blocks in the operation state are associated with the LITTLE CPU, the effective CPUs are set so that all the tasks in the execution state are operated in the LITTLE CPU. When a high performance is not necessarily required, the power consumption of the system can be reduced by the task being operated by the low-performance CPU.
For example, in recent in-vehicle information system, there is a strong demand for performance, such as improving an operation response while simultaneously operating a plurality of applications. For this reason, for example, in an in-vehicle information system, a multiple processor is required to process a plurality of processing in parallel. On the other hand, it is often technically difficult to mount a plurality of high-performance CPUs in a system for reasons such as limitation of a size of hardware (die). Therefore, in the in-vehicle information system, not only a high-performance CPU but also a low-performance CPU which is advantageous in terms of a chip area are required. In the first embodiment, tasks can be allocated to appropriate CPUs based on the definition information or the CPU processing time. Therefore, in a system composed of a plurality of CPUs having different performances, it is possible to improve the performance of parallel processing while preventing performance degradation.
According to the first embodiment, the effective CPU can be allocated by setting the device table 2. The allocation of the effective CPU according to the first embodiment is advantageous in that it can be easily implemented in a system.
Since a configuration of a semiconductor integrated circuit according to a second embodiment is basically the same as the configuration of the semiconductor integrated circuit 100 shown in
Referring to
The device driver 5 monitors the processing time of the corresponding functional block. The monitored processing time value is sent from the device driver 5 to the governor 1.
As shown in
Next, processing of step S11 is executed instead of the processing of step S1 shown in
Next, the processing of step S2 is executed. The governor 1 refers to the device table 2 and determines whether or not the functional block specified in the device table 2 is included in the functional blocks in the operation state. When the functional block specified in the device table 2 is included in the functional blocks in the operation state (step S2: “specified”), the governor 1 executes effective CPU determination processing in step S3A.
If there is a functional block associated with the big CPU among the functional blocks in the operation state, and the processing time of the functional block exceeds the threshold value, the processing proceeds to step S4. In this instance, the effective CPU is set to cause tasks to be executed by the big CPU.
If a functional block associated with the big CPU is included in the functional blocks in the operation state, but the processing time of the functional block is less than the threshold value, or if all the functional blocks in the operation state are associated with the LITTLE CPU, the processing proceeds to step S5. The governor 1 sets the effective CPUs for the tasks to the LITTLE CPU.
Similarly to the first embodiment, when all of the functional blocks registered in the device table 2 are in non-operation state, the processing proceeds to steps S2, S6, and S7 in this order. In step S4 or step S5, the effective CPU for operating the task is determined based on the CPU processing time.
Taking USB as an example, some USB devices, such as a USB mouse, have a relatively small CPU load when a task is operated. On the other hand, some USB devices have a relatively large CPU load, such as a USB memory. According to the first embodiment, since the big CPU is defined in the device table 2 so as to be allocated to the USB device, the big CPU is set to the effective CPUs regardless of the type of the USB device. Therefore, in the first embodiment, even if a task that does not necessarily require a high-performance CPU, such as an operation of the USB mouse, is included in a plurality of tasks, the big CPU is allocated to the task (the operation of the USB mouse) based on the information defined in the device table 2. In the first embodiment, when a plurality of tasks is concurrently in the execution state and the plurality of tasks includes the task of operating the USB mouse, the big CPU is allocated to all of the plurality of tasks.
In the second embodiment, the governor 1 grasps the processing time of the functional block in addition to the CPU processing time. In the device table 2, not only the functional block and the effective CPU are associated with each other, but also the threshold value of the processing time of the functional block is registered. According to the second embodiment, even when a functional block is associated with the big CPU in the device table 2, all the tasks in operation are allocated to the LITTLE CPU if the processing times of the functional block are less than the threshold values.
For example, when a USB memory, an SD card, and an I2C IC are used, a USB controller, an SD host controller, and an I2C controller are used as functional blocks. As shown in
Since a configuration of a semiconductor integrated circuit according to a third embodiment is basically the same as the configuration of the semiconductor integrated circuit 100 shown in
Referring to
The application table 3 is a table in which functional blocks used for executing an application are registered. For example, once an application is executed, the functional blocks used for that application are registered in the application table 3. At the time of the second or subsequent execution of the application, the functional block to be used can be specified from the information registered in the application table 3. However, the means and processes for associating the application with the functional blocks are not limited to the methods described above.
Referring to
In step S3B, the governor 1 acquires information on the effective CPU and the threshold value associated with the functional block from the device table 2. If the processing time of the functional block associated with the big CPU exceed the threshold value, the governor 1 sets the effective CPU to execute the task in the big CPU in step S4. Therefore, the performance of the system is prevented from deteriorating. On the other hand, if the processing time of the functional block is less than the threshold value even if the effective CPU corresponding to the specified functional block is big CPU, or if the LITTLE CPU is registered as the effective CPU corresponding to the specified functional block, the processing proceeds to step S5. The governor 1 sets the effective CPU for the task to the LITTLE CPU. This makes it possible to reduce the power consumption of the system.
In the third embodiment, the processing time of the functional block is grasped for each task in the execution state, and the task requiring the processing performance is specified. The governor 1 allocates a big CPU to the effective CPU that operates the task. This makes it possible to avoid performance degradation of the system. On the other hand, the task that does not require performance is specified, and a LITTLE CPU is allocated to the effective CPU that operates the task. As a result, power consumption can be reduced. Since a plurality of tasks which is simultaneously executed can be shared by the big CPU and the LITTLE CPU from the viewpoint of the required processing performance, the parallel processing performance can be improved while suppressing an increase in power consumed. Further, in the third embodiment, the governor 1 checks the functional block to be used. As a result, the governor 1 can reliably determine whether or not the functional block is a functional block registered in the device table 2. Thus, the optimum CPU from the big CPU and the LITTLE CPU can be allocated for the operation of the task.
According to each of the above embodiments, the semiconductor integrated circuit 100 includes two CPU groups having different performances. However, the present embodiment is not so limited. The semiconductor integrated circuit 100 may include three or more CPU groups having different performances.
The semiconductor integrated circuit according to each of the above embodiments is not limited to be applied to an in-vehicle system. For example, the semiconductor integrated circuit according to each of the above embodiments can be applied to a server system. While the server system is required to have high performance, it is also required to suppress heat generation. According to each of the above embodiments, it is possible to allocate a high-performance CPU when a task having a low CPU load is operated while processing performance is required. Therefore, the processing performance of the server system can be maintained or improved. On the other hand, according to each of the above embodiments, a low-performance CPU can be allocated when a task that does not necessarily require high processing performance is operated. In this case, heat generation of the server system can be suppressed.
When a plurality of tasks is sequentially executed one by one, the number of functional devices in operation is one. Therefore, it is also possible to specify the functional device in operation in the first embodiment and the second embodiment. In such a case, the processing relating to the allocation of the effective CPU is substantially the same between the second embodiment and the third Embodiment. The allocation of the effective CPU in the first embodiment is different from the allocation of the effective CPU in the second embodiment and the third embodiment in that the processing of comparing the processing time of the functional block associated with the big CPU with the threshold value is omitted.
Although the invention made by the present inventors has been specifically described based on the embodiment, the present invention is not limited to the above embodiment, and needless to say, various changes may be made without departing from the scope thereof.
Number | Date | Country | Kind |
---|---|---|---|
2018-117087 | Jun 2018 | JP | national |