The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER BASED ON INFERRED WORKLOAD PARALLELISM, by Rychlik et al., filed concurrently (Attorney Docket Number 100328U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER IN A VIRTUALIZED SYSTEM, by Rychlik et al., filed concurrently (Attorney Docket Number 100329U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR ASYNCHRONOUSLY AND INDEPENDENTLY CONTROLLING CORE CLOCKS IN A MULTICORE CENTRAL PROCESSING UNIT, by Rychlik et al., filed concurrently (Attorney Docket Number 100330U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER WITH REDUCED FREQUENCY OSCILLATIONS, by Thomson et al., filed concurrently (Attorney Docket Number 100339U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER WITH GUARANTEED TRANSIENT DEADLINES, by Thomson et al., filed concurrently (Attorney Docket Number 100340U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER WITH GUARANTEED STEADY STATE DEADLINES, by Thomson et al., filed concurrently (Attorney Docket Number 100341U1).
Portable computing devices (PDs) are ubiquitous. These devices may include cellular telephones, portable digital assistants (PDAs), portable game consoles, palmtop computers, and other portable electronic devices. In addition to the primary function of these devices, many include peripheral functions. For example, a cellular telephone may include the primary function of making cellular telephone calls and the peripheral functions of a still camera, a video camera, global positioning system (GPS) navigation, web browsing, sending and receiving emails, sending and receiving text messages, push-to-talk capabilities, etc. As the functionality of such a device increases, the computing or processing power required to support such functionality also increases. Further, as the computing power increases, there exists a greater need to effectively manage the processor, or processors, that provide the computing power.
Accordingly, what is needed is an improved method of controlling power within a multicore CPU.
In the figures, like reference numerals refer to like parts throughout the various views unless otherwise indicated.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
In this description, the term “application” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, an “application” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
The term “content” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, “content” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
As used in this description, the terms “component,” “database,” “module,” “system,” and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components may execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
Referring initially to
In a particular aspect, as depicted in
Referring to
As illustrated in
As further illustrated in
The PCD 320 may further include a temperature sensor 382 that may be used to sense a die temperature associated with the PCD 320. In other words, the temperature sensor 382 may act as a means for sensing a die temperature associated with the PCD 320.
As depicted in
In a particular aspect, one or more of the method steps described herein may be stored in the memory 344 as computer program instructions. These instructions may be executed by the multicore CPU 324 in order to perform the methods described herein. Further, the multicore CPU 324, the memory 344, temperature sensor 382, or a combination thereof may serve as a means for executing one or more of the method steps described herein in order to control a multicore central processing unit based on temperature.
Referring to
Moreover, as illustrated, the memory 404 may include an operating system 420 stored thereon. The operating system 420 may include a scheduler 422 and the scheduler 422 may include a first run queue 424, a second run queue 426, and an Nth run queue 428. The memory 404 may also include a first application 430, a second application 432, and an Nth application 434 stored thereon.
In a particular aspect, the applications 430, 432, 434 may send one or more tasks 436 to the operating system 420 to be processed at the cores 410, 412, 414 within the multicore CPU 402. The tasks 436 may be processed, or executed, as single tasks, threads, or a combination thereof. Further, the scheduler 422 may schedule the tasks, threads, or a combination thereof for execution within the multicore CPU 402. Additionally, the scheduler 422 may place the tasks, threads, or a combination thereof in the run queues 424, 426, 428. The cores 410, 412, 414 may retrieve the tasks, threads, or a combination thereof from the run queues 424, 426, 428 as instructed, e.g., by the operating system 420 for processing, or execution, of those task and threads at the cores 410, 412, 414.
In a particular aspect, the parallelism monitor 440 may be a software program that monitors the run queues 424, 426, 428 in the scheduler 422. Each run queue 424, 426, 428 (aka, ready-to-run queue) may include a list of current tasks, threads, or a combination thereof that are available for scheduling on one or more cores 410, 412, 414. Some multicore systems may only have a single ready-to-run queue. Other multicore systems may have multiple ready-to-run queues. Regardless, of the number of ready-to-run queues, at any instant in time, the total number of tasks, threads, or a combination thereof waiting on these queues, plus a number of tasks, threads, or a combination thereof actually running, may be an approximation for the degree of parallelism in the workload.
Referring to
At block 504, a die temperature may be monitored. Further, at block 506, a power controller may determine a degree of parallelism in the workload associated with the cores.
Moving to block 508, the power controller may independently power the cores up or down based on the degree of workload parallelism, the die temperature, or a combination thereof. Next, at decision 510, the power controller may determine whether the device is powered off. If the device is powered off, the method may end. Otherwise, if the device remains powered on, the method 500 may return to block 504 and the method 500 may continue as described.
Referring to
Moving to decision 606, a core controller may determine whether the die temperature is equal to a critical condition. For example, the critical condition may be threshold temperature above which operation of the device may begin to break down due to temperature issues.
At decision 606, if the die temperature does not equal a critical condition, the method 600 may return to block 602 and the method 600 may continue as described herein. Otherwise, if the die temperature does equal a critical condition, the method 600 may move to decision 608 and the core controller may determine whether a second core is dormant, e.g., the second core may be a CPU1 (the first core may be CPU0).
If the second core is not dormant, i.e., CPU1 is active and executing tasks and threads, the method 600 may proceed to block 610. At block 610, the frequency of the first core, CPU0, may be set, or otherwise moved, to the maximum of lowering the frequency of the first core one incremental step and an optimal frequency, Fopt. In other words, the core controller may reduce the frequency of the second core one incremental step without going below an optimal frequency, Fopt. The incremental step may be one hundred megahertz (100 MHz) or less. Further, the incremental step may be fifty megahertz (50 MHz) or less. From block 610, the method 600 may return to block 602 and the method 600 may continue as described.
Returning to decision 608, if CPU1 is dormant, i.e., CPU1 is powered off, the method 600 may proceed to decision 612. At decision 612, a controller may determine whether a degree of parallelism meets a condition. Specifically, the controller may determine whether the degree of parallelism is greater than a predetermined threshold indicating that there is enough parallelism in the system to warrant the operation of a second core, CPU1.
At decision 612, if the degree of parallelism does not meet the condition, the method may move to block 614 and the frequency of the first core, CPU0, may be set, or otherwise moved, to the maximum of lowering the frequency of the first core one incremental step and an optimal frequency, Fopt. In other words, the core controller may reduce the frequency of the second core one incremental step without going below an optimal frequency, Fopt. Thereafter, the method 600 may return to block 602 and the method 600 may continue as described herein.
Returning to decision 612, if the degree of parallelism meets the condition, the method may proceed to block 616 and the second CPU, CPU1, may be turned on. Thereafter, at block 618, the frequency of the second core, CPU1, may be set to an optimal frequency, Fopt. Also, at block 618, the frequency of the first core, CPU0, may be set to the maximum of the current frequency of the first core minus the optimal frequency, Fopt, or the optimal frequency, Fopt. For example, if CPU0 is operating at one thousand megahertz (1000 MHz) and CPU1 is powered on to an optimal frequency of six hundred megahertz (600 MHz), the frequency of CPU0 may be changed to six hundred megahertz (600 MHz) because 1000 MHz minus 600 MHz is equal to four hundred megahertz (400 MHz) and 600 MHz (the optimal frequency, Fop) is greater than 400 MHz (the result of the subtraction operation).
In other example, if CPU0 is operating at one thousand megahertz (1400 MHz) and CPU1 is powered on to an optimal frequency of six hundred megahertz (600 MHz), the frequency of CPU0 may be changed to eight hundred megahertz (800 MHz) because 1400 MHz minus 600 MHz is equal to eight hundred megahertz (800 MHz) and 800 MHz (the result of the subtraction operation) is greater than 600 MHz (the optimal frequency, Fop).
Moving to decision 620, the controller may determine whether there is sustained parallelism in the system. In other words, the controller may determine whether the degree of parallelism in the system meets a condition for at least a predetermined amount of time. The condition may be a threshold value of parallelism and if the parallelism in the system is greater than the threshold value, the condition may be considered met. At decision 620, if the parallelism is sustained, the method 600 may return to block 602 and the method 600 may continue as described herein.
Returning to decision 620, if the parallelism is not sustained, the method 600 may proceed to block 622 and the second core, CPU1, may be turned off. Thereafter, the method 600 may return to block 602 and the method 600 may continue as described herein.
Referring now to
Moving to decision 706, a core controller may determine whether the die temperature is equal to a critical condition. For example, the critical condition may be threshold temperature above which operation of the device may begin to break down due to temperature issues.
At decision 706, if the die temperature does not equal a critical condition, the method 700 may return to block 702 and the method 700 may continue as described herein. Otherwise, if the die temperature does equal a critical condition, the method 700 may move to decision 708 and the core controller may determine whether a second core is dormant, e.g., the second core may be a CPU1 (the first core may be CPU0). If the second core is not dormant, the method 700 may proceed to decision 802 of
Otherwise, if CPU1 is not dormant, i.e., CPU1 is powered off, the method 700 may proceed to decision 710. At decision 710, a controller may determine whether a degree of parallelism meets a condition. Specifically, the controller may determine whether the degree of parallelism is greater than a predetermined threshold indicating that there is enough parallelism in the system to warrant the operation of a second core, CPU1.
At decision 710, if the degree of parallelism does not meet the condition, the method may move to block 712 and the frequency of the first core, CPU0, may be set, or otherwise moved, to the maximum of lowering the frequency of the first core one incremental step and an optimal frequency, Fopt. In other words, the core controller may reduce the frequency of the second core one incremental step without going below an optimal frequency, Fopt. Thereafter, the method 700 may return to block 702 and the method 700 may continue as described herein.
Returning to decision 710, if the degree of parallelism meets the condition, the method may proceed to block 714 and the second CPU, CPU1, may be turned on. Thereafter, at block 716, the frequency of the second core, CPU1, may be set to an optimal frequency, Fopt. Further, at block 716, the frequency of the first core, CPU0, may be set to the maximum of the current frequency of the first core minus the optimal frequency, Fopt, or the optimal frequency, Fopt.
Moving to decision 718, the controller may determine whether there is sustained parallelism in the system. In other words, the controller may determine whether the degree of parallelism in the system meets a condition for at least a predetermined amount of time warranting the continued operation of both cores. The condition may be a threshold value of parallelism and if the parallelism in the system is greater than the threshold value, the condition may be considered met. At decision 718, if the parallelism is sustained, the method 700 may return to block 702 and the method 700 may continue as described herein.
Returning to decision 718, if the parallelism is not sustained, the method 700 may proceed to block 720 and the second core, CPU1, may be turned off. Thereafter, the method 700 may return to block 702 and the method 700 may continue as described herein.
Returning to decision 708, if the second core, CPU1, is not dormant, the method 700 may move to decision 802. At decision 802, the core controller may determine whether an Nth core, CPUN is dormant. If the Nth core is not dormant, the method 700 may proceed to block 804. At block 804, the frequency of the first core, CPU0, may be set, or otherwise moved, to the maximum of lowering the frequency of the first core one incremental step and an optimal frequency, Fopt. In other words, the core controller may reduce the frequency of the second core one incremental step without going below an optimal frequency, Fopt. Further, the frequency of the second core, CPU1, may be set, or otherwise moved to the maximum of lowering the frequency of the first core one incremental step and an optimal frequency, Fopt. Also, the frequency of the Nth core, CPUN, may be set, or otherwise moved to the maximum of lowering the frequency of the first core one incremental step and an optimal frequency, Fopt. From block 804, the method 700 may return to block 702 of
Returning to decision 802, if CPUN is not dormant, i.e., CPU1 is active and executing tasks and threads, the method 700 may proceed to decision 806. At decision 806, a controller may determine whether a degree of parallelism meets a condition. Specifically, the controller may determine whether the degree of parallelism is greater than a predetermined threshold indicating that there is enough parallelism in the system to warrant the operation of an Nth core, CPUN.
At decision 806, if the degree of parallelism does not meet the condition, the method may move to block 808 and the frequency of the first core, CPU0, may be set, or otherwise moved, to the maximum of lowering the frequency of the first core one incremental step and an optimal frequency, Fopt. In other words, the core controller may reduce the frequency of the second core one incremental step without going below an optimal frequency, Fopt. Also, at block 808, the frequency of the second core, CPU2, may be set, or otherwise moved, to the maximum of lowering the frequency of the first core one incremental step and an optimal frequency, Fopt. Thereafter, the method 700 may return to block 702 of
Returning to decision 806, if the degree of parallelism meets the condition, the method may proceed to block 810 and the Nth CPU, CPUN, may be turned on. Thereafter, at block 812, the frequency of the Nth core, CPUN, may be set to an optimal frequency, Fopt. Further, at block 812, the frequency of the first core, CPU0, and the second core, CPU1, may be set to the maximum of the current frequency of the first core minus the optimal frequency, Fopt, or the optimal frequency, Fopt.
Moving to decision 814, the controller may determine whether there is sustained parallelism in the system. In other words, the controller may determine whether the degree of parallelism in the system meets a condition for at least a predetermined amount of time to warrant the operation of N cores. The condition may be a threshold value of parallelism and if the parallelism in the system is greater than the threshold value, the condition may be considered met. At decision 814, if the parallelism is sustained, the method 700 may return to block 802 and the method 700 may continue as described herein.
Returning to decision 814, if the parallelism is not sustained, the method 700 may proceed to block 822 and one or more cores may be turned off. Thereafter, the method 700 may return to block 702 of
It is to be understood that the method steps described herein need not necessarily be performed in the order as described. Further, words such as “thereafter,” “then,” “next,” etc. are not intended to limit the order of the steps. These words are simply used to guide the reader through the description of the method steps. Moreover, the methods described herein are described as executable on a portable computing device (PCD). The PCD may be a mobile telephone device, a portable digital assistant device, a smartbook computing device, a netbook computing device, a laptop computing device, a desktop computing device, or a combination thereof.
In a particular aspect, it may be appreciated that dynamic power in a system is proportional to V̂2 f, where f is the clock frequency and V represents voltage. Voltage is also positively correlated with frequency. In other words, there exists a minimum voltage for the CPU to run at a given clock frequency. As such, the heat generated at the die roughly proportional to f̂3. In certain aspects, it may be possible that when a particular device is assembled, the device may not be able to sufficiently dissipate the heat generated when a CPU core is run at or near its highest frequency.
The system and method disclosed herein provides a way to prevent overheating of a device by exploiting the parallelism in the system and spreading the workload across multiple cores, thereby running each core at much lower frequency. Because of cubic non-linearity in heat generation with respect to clock frequency, running two cores at lower frequencies will result in generating a lot less heat compared to when it is run on a single core, without sacrificing user experience.
In a particular aspect, the degree of parallelism in the workload may be dynamically inferred at task/thread level by monitoring an operating system state. For example, one operating system state that may be monitored is the length of all OS scheduler ready-to-run queues. The scheduler ready-to-run queue is a list of current tasks of threads that are available for scheduling on CPUs.
Using a parallelism monitor, the system may be able to determine whether there is sufficient parallelism in the system at any point of time and whether the parallelism is sustained over a period of time. Both these may be used in the load balancing algorithm described herein.
The load-balancing algorithm disclosed herein may take periodically measured die temperature as one of the inputs. Further, the load-balancing algorithm may compare the die temperature to a threshold temperature, THS, which is the highest temperature still considered safe for proper operation of a handset. In a particular embodiment, THS may be found through experimentation.
In a particular aspect, for each core, there exists a most power efficient voltage and frequency point, Fopt. Fopt may be near the highest frequency level that the minimum operating voltage can sustain. For a homogeneous dual CPU-core based system, both cores running at Fopt may not generate enough heat to take the temperature beyond THS. During operation, the CPU operating frequencies may be incrementally changed and are often a handful of discrete values, typically in steps of 50-100 MHz.
Starting at any point of time, if the temperature sensor ever crosses temperature threshold THS, a controller may check if one core or both cores are running If only one core is running, the controller checks if there is enough parallelism in the system. If there enough parallelism, the controller may bring up the second core at Fopt while reducing the frequency of the first core by the same amount unless doing so would bring the frequency below Fopt. If reducing the frequency of the first core would bring the frequency below Fopt, the controller leaves the first core at Fopt. Again, it may be appreciated that running both cores running at Fopt, may not increase the temperature beyond THS. Spreading the work between two cores may cool down the system without loss of MIPS and harming the user experience.
In one or more exemplary aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer program product such as a machine readable medium, i.e., a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.
The present application claims priority to U.S. Provisional Patent Application Ser. No. 61/287,011, entitled SYSTEM AND METHOD OF DYNAMICALLY CONTROLLING A PLURALITY OF CORES IN A MULTICORE CENTRAL PROCESSING UNIT, filed on Dec. 16, 2009.
Number | Date | Country | |
---|---|---|---|
61287011 | Dec 2009 | US |