The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER BASED ON INFERRED WORKLOAD PARALLELISM, by Rychlik et al., filed concurrently (Attorney Docket Number 100328U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER IN A VIRTUALIZED SYSTEM, by Rychlik et al., filed concurrently (Attorney Docket Number 100329U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR ASYNCHRONOUSLY AND INDEPENDENTLY CONTROLLING CORE CLOCKS IN A MULTICORE CENTRAL PROCESSING UNIT, by Rychlik et al., filed concurrently (Attorney Docket Number 100330U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER WITH REDUCED FREQUENCY OSCILLATIONS, by Thomson et al., filed concurrently (Attorney Docket Number 100339U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR CONTROLLING CENTRAL PROCESSING UNIT POWER WITH GUARANTEED STEADY STATE DEADLINES, by Thomson et al., filed concurrently (Attorney Docket Number 100341U1). The present application is related to, and incorporates by reference, U.S. patent application Ser. No. ______, entitled SYSTEM AND METHOD FOR DYNAMICALLY CONTROLLING A PLURALITY OF CORES IN A MULTICORE CENTRAL PROCESSING UNIT BASED ON TEMPERATURE, by Sur et al., filed concurrently (Attorney Docket Number 100344U1).
Portable computing devices (PDs) are ubiquitous. These devices may include cellular telephones, portable digital assistants (PDAs), portable game consoles, palmtop computers, and other portable electronic devices. In addition to the primary function of these devices, many include peripheral functions. For example, a cellular telephone may include the primary function of making cellular telephone calls and the peripheral functions of a still camera, a video camera, global positioning system (GPS) navigation, web browsing, sending and receiving emails, sending and receiving text messages, push-to-talk capabilities, etc. As the functionality of such a device increases, the computing or processing power required to support such functionality also increases. Further, as the computing power increases, there exists a greater need to effectively manage the processor, or processors, that provide the computing power.
Accordingly, what is needed is an improved method of controlling power within a multicore CPU.
In the figures, like reference numerals refer to like parts throughout the various views unless otherwise indicated.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
In this description, the term “application” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, an “application” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
The term “content” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, “content” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
As used in this description, the terms “component,” “database,” “module,” “system,” and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components may execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
Referring initially to
In a particular aspect, as depicted in
Referring to
As illustrated in
As further illustrated in
As depicted in
In a particular aspect, one or more of the method steps described herein may be stored in the memory 344 as computer program instructions. These instructions may be executed by the multicore CPU 324 in order to perform the methods described herein. Further, the multicore CPU 324, the memory 344, or a combination thereof may serve as a means for executing one or more of the method steps described herein in order to a dynamically control the power of each CPU, or core, within the multicore CPU 324.
Referring to
Moreover, as illustrated, the memory 404 may include an operating system 420 stored thereon. The operating system 420 may include a scheduler 422 and the scheduler 422 may include a first run queue 424, a second run queue 426, and an Nth run queue 428. The memory 404 may also include a first application 430, a second application 432, and an Nth application 434 stored thereon.
In a particular aspect, the applications 430, 432, 434 may send one or more tasks 436 to the operating system 420 to be processed at the cores 410, 412, 414 within the multicore CPU 402. The tasks 436 may be processed, or executed, as single tasks, threads, or a combination thereof. Further, the scheduler 422 may schedule the tasks, threads, or a combination thereof for execution within the multicore CPU 402. Additionally, the scheduler 422 may place the tasks, threads, or a combination thereof in the run queues 424, 426, 428. The cores 410, 412, 414 may retrieve the tasks, threads, or a combination thereof from the run queues 424, 426, 428 as instructed, e.g., by the operating system 420 for processing, or execution, of those task and threads at the cores 410, 412, 414.
Referring to
At block 504, a power controller, e.g., a dynamic clock and voltage scaling (DCVS) algorithm, may monitor one or more CPUs. At decision 506, the power controller may determine whether a transient performance deadline for a CPU has expired. If not, the method 500 may end. Otherwise, if the transient performance deadline has expired, the method 500 may proceed to block 508 and the power controller may move the CPU to a higher performance level, i.e., a next higher operating frequency. In one aspect, the controller may move the CPU to a maximum performance level, i.e., a maximum CPU frequency. However, in another aspect, the CPU may not jump to a maximum performance level. The CPU may jump to an intermediate level and then, jump again, either to the maximum level or another higher performance level. The number of intermediate jumps may and the amount of time between jumps may be used to determine the frequency value of the jump.
At block 510, the CPU may enter an idle condition. Further, at block 512, the transient performance deadline may be reset. At block 514, the CPU may exit the idle condition. Moving to decision 516, the power controller may determine whether the upcoming CPU frequency is at a maximum CPU frequency. If so, the method 500 may end. Otherwise, if the CPU frequency is not at the maximum CPU frequency, the method may proceed to block 518 and the timer may be rescheduled. Then, the method 500 may end.
Referring to
At block 608, the CPU may enter a software wait for interrupt (SWFI) condition. At block 610, the CPU may exit the SWFI condition. Moving to block 612, the power controller may set an end idle time (EndIdleTime) equal to a current time (CurrentTime). Further, at block 614, the power controller may determine an idle time (IdleTime) by subtracting the start idle time (StartIdleTime) from the end idle time (EndIdleTime). At block 616, the power controller may determine an upcoming CPU frequency (CPUFreq) from an updated steady state filter (UpdateSteadyStateFilter) a busy time (BusyTime) and an idle time (IdleTime). Thereafter, the method 600 may continue to block 702 of
At block 702, the power controller may determine an effective transient budget (EffectiveTransientBudget) using the following formula:
where,
TransientResponseDeadline=A transient response deadline, i.e., slack budget,
NextCPUFreq=A next CPU frequency that is one frequency step higher than an upcoming CPU frequency, and
CPUFreq=An upcoming CPU frequency (CPUFreq).
In a particular aspect, a clock scheduling overhead (ClockSchedulingOverhead) and a clock switch overhead (ClockSwitchOverhead) may also be added to the EffectiveTransientBudget. Further, a voltage change overhead (VoltageChangeOverhead) may be added to the EffectiveTransientBudget. Moving to block 704, the power controller may set a deadline to jump to a higher frequency (SetJumpToFrequency) equal to the end idle time (EndIdleTime) plus the effective transient budget (EffectiveTransientBudget). In another aspect, the deadline to jump may be the current time plus the transient budget. Thereafter, the method 600 may end.
In a particular aspect, the method 600 described in conjunction with
It is to be understood that the method steps described herein need not necessarily be performed in the order as described. Further, words such as “thereafter,” “then,” “next,” etc. are not intended to limit the order of the steps. These words are simply used to guide the reader through the description of the method steps. Moreover, the methods described herein are described as executable on a portable computing device (PCD). The PCD may be a mobile telephone device, a portable digital assistant device, a smartbook computing device, a netbook computing device, a laptop computing device, a desktop computing device, or a combination thereof.
In a particular aspect, a DCVS algorithm is a mechanism which measures CPU load/idle time and dynamically adjusts the CPU clock frequency to track the workload in an effort to reduce power consumption while still providing satisfactory system performance. As the workload changes, the change in CPU throughput may track, but also necessarily lag, the changes in the workload. Unfortunately, this may introduce a problem in cases where the workload has Quality of Service (QoS) requirements, as the DCVS algorithm may not track the workload quickly enough. Further, tasks may fail.
Many DCVS techniques involve measuring the steady state performance requirements of the CPU and setting the CPU frequency and voltage to the lowest level that may meet the steady state CPU usage. This is typically done by measuring the CPU utilization (percentage busy) over a period of time and setting the CPU performance level to one in which the average CPU utilization falls between a high and low threshold. The averaging period is optimized to minimize the frequency of changing clock frequencies, while maintaining reasonable responsiveness. In order to respond to transient workloads and/or the start of new workloads panic inputs may have been utilized to quickly bring up the CPU frequency.
In order to avoid the problem of the DCVS lagging the workload and causing tasks to fail, the system and methods disclosed herein provide a transient performance guarantee. The transient performance guarantee may be defined as the maximum amount of time that a continuously busy pulse may be delayed, as compared to running at the higher performance level. This may be accomplished by getting to the higher performance level prior to the transient performance deadline expiring and resetting the deadline whenever we go idle, since if the CPU is idle, it is by definition not in an oversubscribed state. As disclosed herein, the timer may be rescheduled to preserve the QoS guarantee whenever the system comes out of idle and the system CPU is not running at the maximum frequency.
In order to minimize the power impact of the transient performance guarantee, the present system and methods minimize the likely hood that an incoming pulse may require a frequency increase in order to meet the deadline. This may be accomplished by delaying the frequency, i.e., performance level, change until the effective transient budget has been exhausted and then, jumping straight to the higher performance level and staying there until the pulse is complete as shown in
In a particular aspect, the effective transient budget is calculated as the transient response deadline scaled to the current performance level. For example, if the CPU is running a 75% of the maximum clock rate and the transient response deadline is 16 ms, the effective transient budget is 64 ms, i.e., 16 ms/(1-0.75). The effective transient budget represents how long the CPU may run at the current performance level prior to exhausting the budget. If the CPU is idle, the effective transient budget may be the same as the transient response deadline. If we are at the maximum performance level, the effective transient budget is infinite as shown in
Using the methods described herein, the system may provide a strict bound on the maximum amount of time a task might run at some level other than the maximum level, and therefore implicitly provide a calculable bound on completion for tasks that require QoS guarantees, while still allowing dynamic CPU clock scaling. The bound may be set based on what tasks are currently running, a global system property, DCVS algorithm design or other properties, and may be entirely disabled if the system is not running any tasks that have QoS requirements or if the CPU is running at max clock.
In a particular aspect, the present methods may be extended by, instead of jumping to the maximum frequency when the deadline has expired, setting shorter internal effective deadlines and jumping to one, or more, intermediate frequencies, while still ensuring that the CPU is at the maximum frequency before the maximum QoS delay has been exhausted. Further, the present methods may substantially ensure that a well defined transient QoS is maintained, while simultaneously reducing overall CPU power.
The system and methods described herein may utilize opportunistic sampling. In other words, the system and methods may check for timer expiration on a periodic basis. In other aspects, the system and methods may not utilize opportunistic sampling.
In one or more exemplary aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer program product such as a machine readable medium, i.e., a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.
The present application claims priority to U.S. Provisional Patent Application Ser. No. 61/286,991, entitled SYSTEM AND METHOD OF DYNAMICALLY CONTROLLING POWER IN A CENTRAL PROCESSING UNIT, filed on Dec. 16, 2009, the contents of which are fully incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61286991 | Dec 2009 | US |