Portable computing devices (PDs) are ubiquitous. These devices may include cellular telephones, portable digital assistants (PDAs), portable game consoles, palmtop computers, and other portable electronic devices. In addition to the primary function of these devices, many include peripheral functions. For example, a cellular telephone may include the primary function of making cellular telephone calls and the peripheral functions of a still camera, a video camera, global positioning system (GPS) navigation, web browsing, sending and receiving emails, sending and receiving text messages, push-to-talk capabilities, etc. As the functionality of such a device increases, the computing or processing power required to support such functionality also increases. Further, as the computing power increases, there exists a greater need to effectively manage the processor, or processors, that provide the computing power.
Accordingly, what is needed is an improved method of controlling power within a CPU.
In the figures, like reference numerals refer to like parts throughout the various views unless otherwise indicated.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
In this description, the term “application” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, an “application” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
The term “content” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, “content” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
As used in this description, the terms “component,” “database,” “module,” “system,” and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components may execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
Referring initially to
In a particular aspect, as depicted in
Referring to
As illustrated in
As further illustrated in
As depicted in
In a particular aspect, one or more of the method steps described herein may be stored in the memory 344 as computer program instructions. These instructions may be executed by the multicore CPU 324 in order to perform the methods described herein. Further, the multicore CPU 324, the memory 344, or a combination thereof may serve as a means for executing one or more of the method steps described herein in order to dynamically control the power within a CPU, or core, of the multicore CPU 324.
Referring to
Moreover, as illustrated, the memory 404 may include an operating system 420 stored thereon. The operating system 420 may include a scheduler 422 and the scheduler 422 may include a first run queue 424, a second run queue 426, and an Nth run queue 428. The memory 404 may also include a first application 430, a second application 432, and an Nth application 434 stored thereon.
In a particular aspect, the applications 430, 432, 434 may send one or more tasks 436 to the operating system 420 to be processed at the cores 410, 412, 414 within the multicore CPU 402. The tasks 436 may be processed, or executed, as single tasks, threads, or a combination thereof. Further, the scheduler 422 may schedule the tasks, threads, or a combination thereof for execution within the multicore CPU 402. Additionally, the scheduler 422 may place the tasks, threads, or a combination thereof in the run queues 424, 426, 428. The cores 410, 412, 414 may retrieve the tasks, threads, or a combination thereof from the run queues 424, 426, 428 as instructed, e.g., by the operating system 420 for processing, or execution, of those task and threads at the cores 410, 412, 414.
Referring to
At block 504, a power controller, e.g., a dynamic clock and voltage scaling (DCVS) algorithm, may monitor one or more CPUs. At decision 506, the power controller may determine whether the CPU is idle. If not, the method 500 may return to block 504 and continue as described herein. Otherwise, if the CPU is idle, the method 500 may proceed to block 508 and the power controller may review a busy cycle, i.e., operation window, immediately prior to the current idle state. At block 510, the power controller may determine the total work load during the previous busy cycle. Further, at block 512, the power controller may review the operational frequencies utilized during the previous busy cycle.
Moving to decision 513, the power controller may determine whether the previous busy cycle ended at the steady state level. If so, the method 500 may proceed to block 516 and the power controller may set the CPU frequency to a steady state value. Then, the method 500 may proceed to decision 518. At decision 518, the power controller may determine whether the device is powered off. If the device is powered off, the method may end. Otherwise, if the device remains powered on, the method 500 may return to block 504 and the method 500 may continue as described.
Returning to decision 513, if the previous busy cycle did not end at a the steady state, the method 500 may move to decision 514 and the power controller may determine whether the previous busy cycle included any frequency jumps or is at the maximum performance level, e.g., due to workload increases. If so, the method 500 may proceed to block 517 and the power controller may reset the longest normalized busy period. The method 500 may continue block 520 and continue as described herein.
Returning to decision 514, if the previous busy cycle did not include any frequency jumps and is not at the maximum performance level, the method 500 may proceed to block 520 and the power controller may determine the longest normalized busy period since being reset. At block 522, the power controller may determine a minimum operational frequency that would not have caused a frequency jump had it been used starting at the time the longest busy period was last reset. Next, the power controller sets the CPU frequency to the minimum frequency determined above and resets the longest busy period if the minimum frequency is not the same as the previous CPU frequency. The method 500 may then proceed to decision 518. At decision 518, the power controller may determine whether the device is powered off. If the device is powered off, the method may end. Otherwise, if the device remains powered on, the method 500 may return to block 504 and the method 500 may continue as described.
In a particular aspect, the method 500 may include a steady state and a transient state. Decision 513 may be used to control the transition between the steady state and the transient state. Having the ability to transition between the steady state and the transient state may reduce excessive oscillation in the frequency. Further, the method 500 may be considered self-tuning and may provide dynamic window sizes.
Referring to
At block 608, the CPU may enter a software wait for interrupt (SWFI) condition. At block 610, the CPU may exit the SWFI condition. Moving to block 612, the power controller may set an end idle time (EndIdleTime) equal to a current time (CurrentTime). At decision 614, the power controller may determine whether the highest CPU frequency of the previous busy cycle is greater than a steady state frequency. If not, the method 600 may end. Otherwise, the method 600 may proceed to decision 702 of
At decision 702 of
The BusyTimeAtMax may be determined using the following formula:
For example, if the previous busy cycle was two milliseconds (2 ms) and the CPU spent one millisecond (1 ms) at the a maximum frequency of one GigaHertz (1 GHz) and one millisecond at a nominal frequency of one hundred Megahertz (100 MHz), the BusyTimeAtMax would be equal to 1.1 ms.
Moving to block 706, the power controller may determine a CPU frequency (CPUFreq) that would not have caused any frequency jumps. Specifically, the power controller may determine the lowest CPU frequency that would have eliminated any frequency jumps. This is determined by calculating the slack budget for each CPU frequency that would be calculated by the transient filter and determining if the slack budget is sufficient that it would not cause the transient filter to make frequency jump.
This determination may be made by setting the CPUFreq equal to the minimum CPU frequency (MinCPUFreq) and then performing a do loop until a condition is met. Each time the condition is not met the CPUFreq may be increased by one value (CPUFreq=CPUFreq+1) until a maximum CPU frequency (MaxCPUFreq) is met. The condition is as follows:
(((MaxCPUFreq*slackBudget)/(MaxCPUFreq−CPUFreq))*SteadyStateAdjustment)<=(BusyTimeAtMax*(MaxCPUFreq/CPUFreq))
where,
Once the condition is met, the CPU frequency may be set to the CPUFreq above that meets the condition. Moving to block 708 through 712, the power controller may initialize the state of the energy minimization algorithm described herein. Specifically, at block 708, the power controller may set a last lower time (LastLowerTime) equal to the end idle time (EndIdleTime). At block 710, the power controller may set a maximum busy time at maximum (MaxBusyTimeAtMax) equal to zero. Next, at block 712, the power controller may set a total busy time (TotalBusyTime) equal to zero. Thereafter, the method 600 may end.
Returning to decision 702, if the previous busy cycle did not include any frequency jumps and is not at the maximum performance level, the method may proceed to block 802 of
At block 802, the power controller may determine a busy time at max (BusyTimeAtMax) using the same formula described above. Thereafter, at decision 804, the power controller may determine whether the BusyTimeAtMax is greater than a maximum busy time at the maximum CPU (MaxBusyTimeAtMax). If the BusyTimeAtMax is greater than the MaxBusyTimeAtMax, the method 600 may proceed to block 806 and the power controller may set the MaxBusyTimeAtMax equal to the BusyTimeAtMax. Then, the method 800 may move to block 808. At decision 804, if the BusyTimeAtMax is not greater than the MaxBusyTimeAtMax, the method 800 may move directly to block 808.
At block 808, the power controller may determine a running total busy time (TotalBusyTime) by adding the busy time (BusyTime) to the total busy time (TotalBusyTime). At block 810, the power controller may determine a non jumping frequency (NonJumpingFrequency) using the same do loop described above in conjunction with block 706. Moving to block 812, the power controller may determine an energy saving frequency (EnergySavingFrequency). In a particular aspect, the EnergySavingFrequency is the lowest frequency (starting from the level calculated in block 810) that the CPU should be set too in order to save on energy consumption. In this aspect, the assumption may be made that the system will not need to jump for at least as long as the amount of time since the last lowering of frequency. Also, the assumption may be made that immediately after the same time period a jump will occur. This step also includes the clock switching overhead and the scheduling overhead.
In a particular aspect, the amount of energy consumed at the current CPU frequency is determined. That value may be denoted DCVSFloorEnergy. Then, the CPU frequency may be raised until a value is found that has at most a one percent (1%) clock switch overhead, or that uses less energy than the current performance level. In a particular aspect, the one percent (1%) value is arbitrary and may be eliminated.
While the CPU frequency is less than the current CPU frequency and the elapsed time (elapsedTime) is less than clockSwitchOverhead times 2 times one hundred, the system may determine how long to run at the CPU frequency in order to see the exact same workload as the previous busy cycle. That value may be denoted as the DCVSJFloorBusyTime and may be determined using the following formula:
DCVSJFloorBusyTime=(totalBusyTime*CPUFreq/currentCPUFreq)+(clockSwitchOverhead*2)
Also, the system may determine the amount of energy that the CPU would consume at the CPUFreq with the same workload as the previous busy cycle. Moving to decision 814, the power controller may determine whether the EnergySavingFrequency is less than the current CPUFrequency. If so, the method 600 may return to block 708 of
It is to be understood that the method steps described herein need not necessarily be performed in the order as described. Further, words such as “thereafter,” “then,” “next,” etc. are not intended to limit the order of the steps. These words are simply used to guide the reader through the description of the method steps. Moreover, the methods described herein are described as executable on a portable computing device (PCD). The PCD may be a mobile telephone device, a portable digital assistant device, a smartbook computing device, a netbook computing device, a laptop computing device, a desktop computing device, or a combination thereof.
In a particular aspect, a DCVS algorithm is a mechanism which measures CPU load/idle time and dynamically adjusts the CPU clock frequency to track the workload in an effort to reduce power consumption while still providing satisfactory system performance. As the workload changes, the change in CPU throughput may track, but also necessarily lag, the changes in the workload. Unfortunately, this may introduce a problem in cases where the workload has Quality of Service (QoS) requirements, as the DCVS algorithm may not track the workload quickly enough. Further, tasks may fail. The performance (QoS) issues may be solved with the introduction of transient performance deadlines, i.e., explicit panics to a higher performance level, however this may result in an actual increase in power due frequency oscillations induced when transitioning between steady state and transient CPU frequencies.
The system and methods described herein may be used to manage the transition between transient and steady state performance levels. Further, the system and methods described herein may substantially reduce any oscillation. As a result, there may be substantial savings in net power consumed. As shown in
In order to avoid the problem of excessive frequency oscillations due to QoS deadlines and/or explicit panics, the present methods introduce an energy minimization algorithm which may control the transitions between the steady state and transient state, i.e., explicit panics to a higher performance levels. The energy minimization methods, or algorithms, described herein may effectively managing the jumps between the maximum performance level caused by the transient response guarantee, i.e., the explicit jumps to higher performance levels, and the lower steady state performance level.
In the absence of any jumps between the two levels, the energy minimization algorithm can just set the CPU performance level to the steady state value. As such, the methods described herein may determine how to lower the performance level down to the steady state level in the most energy efficient manner. Further, these methods may actively manage the performance level from the moment in time that a transient pulse, i.e., an explicit panic to a higher frequency, completes, until the performance level is taken back down to the level indicated by the steady state level. In general, the performance level may be taken down in discrete steps that will eliminate the possibility of needing a jump to a higher performance level if the exact same idle/busy profile was repeated, that was just seen since the last drop in frequency (performance level). For example, if a transient pulse causes a jump to the maximum clock frequency, then on the next idle period, the energy minimization methods, described herein, may set the performance level to that which would have eliminated the jump. On each succeeding idle period a controller may determine the lowest frequency at or above the steady state performance level that would have saved energy, assuming that the exact same idle/busy profile is repeated from the point in time that the last frequency reduction was made.
In a particular aspect, the methods described herein may utilize other approaches to reduce the frequency from the higher performance level down to the steady state level. For example, steps may be time, steps may be linear, steps may be non-linear, a low pass filter based on jumps per second may be used, or any combination thereof may be used.
Further, the system and method described herein may ensure that excessive frequency changes may not be made, despite the presence of QoS deadlines, or explicit panics to higher performance levels. Accordingly, power consumption may be substantially lowered.
In one or more exemplary aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer program product such as a machine readable medium, i.e., a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.
The present application claims priority to U.S. Provisional Patent Application Ser. No. 61/286,979, entitled SYSTEM AND METHOD OF DYNAMICALLY CONTROLLING POWER IN CENTRAL PROCESSING UNIT, filed on Dec. 16, 2009, the contents of which are fully incorporated by reference. The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER BASED ON INFERRED WORKLOAD PARALLELISM, by Rychlik et al., filed concurrently (Attorney Docket Number 100328U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER IN A VIRTUALIZED SYSTEM, by Rychlik et al., filed concurrently (Attorney Docket Number 100329U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR ASYNCHRONOUSLY AND INDEPENDENTLY CONTROLLING CORE CLOCKS IN A MULTICORE CENTRAL PROCESSING UNIT, by Rychlik et al., filed concurrently (Attorney Docket Number 100330U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER WITH GUARANTEED TRANSIENT DEADLINES, by Thomson et al., filed concurrently (Attorney Docket Number 100340U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER WITH GUARANTEED STEADY STATE DEADLINES, by Thomson et al., filed concurrently (Attorney Docket Number 100341U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR DYNAMICALLY CONTROLLING A PLURALITY OF CORES IN A MULTICORE CENTRAL PROCESSING UNIT BASED ON TEMPERATURE, by Sur et al., filed concurrently (Attorney Docket Number 100344U1).
Number | Date | Country | |
---|---|---|---|
61286979 | Dec 2009 | US |