1. Field of the Invention
The present invention is related to computer systems in which processor clock frequencies are adaptively adjusted in response to dynamic measurements of operating conditions, and in particular to a computer system in which power supply voltage domains are adjusted to cause an adaptive change in performance of the processors in the corresponding voltage domains.
2. Description of Related Art
In recent computer systems, processor cores provide adaptive adjustment of their performance, e.g., by adjusting processor clock frequency, so that higher operating frequencies can be achieved, under most operating conditions and with most production processors, than could be otherwise specified. A specified maximum operating frequency for a given power supply voltage, and similarly a specified minimum power supply voltage for a given operating frequency, are necessarily conservative due to variable operating ranges of temperature and voltage and also ranges of manufacturing process variation for the particular device, i.e., the processor integrated circuit (IC). Workload differences also contribute to the need to provide operating margins for fail-safe operation, as the local voltage and temperatures at particular processor cores and particular locations within each processor core can vary depending on the particular program code being executed, and particular data or other input being processed. However, with an adaptive adjustment scheme, the effects of process, temperature and voltage can be taken into account, permitting much less conservative operation than would be possible in a fixed clocking scheme.
One technique for adaptive adjustment of processor core clock frequency uses periodic measurements of propagation delay of one or more circuits that synthesize a critical signal path in the processor core. The critical path is a signal path that is determinative of the maximum operating frequency of the processor core under the instant operating conditions, i.e., the critical path is the signal path that will cause operating failure should the processor clock frequency be increased beyond an absolute maximum frequency for the instant operating conditions. The critical path may change under differing operating conditions, e.g., with temperature changes or with power supply voltage changes or with workload changes. Therefore, the critical path monitoring circuits (CPMs) as described above generally include some flexibility in the simulation/synthesis of the critical path delay, as well as computational ability to combine the results of simpler delay components to yield a result for a more complex and typically longer, critical path. Other techniques include using ring oscillators to determine the effects of environmental factors and process on circuit delay. Once the critical path delay is known for the present temperature and power supply voltage, the processor clock frequency can be increased to take advantage of any available headroom. In one implementation, multiple CPMs distributed around the processor IC die provide information to a clock generator within the processor IC that uses a digital phase-lock loop (DPLL) to generate the processor clock. The combined information allows the clock generator to adaptively adjust the processor clock to the instant operating conditions of the processor IC, which is further adapted to the processor IC's own characteristics due to process variation.
Other techniques that may be used for processor frequency adjustment under dynamic operating conditions may use extrinsic environmental information to set the processor clock frequency, e.g., the temperature and power supply voltage within or without the processor IC die, to estimate the maximum processor frequency, rather than the more direct approach of measuring delay of a synthesized critical path. While the extrinsic measurements do not typically account for process variation, a significant performance advantage can still be realized by compensating for temperature and voltage variation, especially for processor ICs in which manufacturing process variation has a relatively minor impact on clock frequency. Further, other throttling mechanisms, such as adjusting the instruction dispatch, fetch or decode rates of the processor cores can be used to adjust the effective processor clock frequency, and thereby adapt the operating performance/power level of a processor in conformity with environmental measurements.
Once a system is implemented using adaptively-clocked processors, such as those described above, the individual frequencies of the processor cores will necessarily vary within the system and will be distributed according to their local power supply voltage, temperatures, process characteristics of the individual processors, and workloads being executed, to achieve the maximum performance available while maintaining some safety margin. Such operation is not necessarily desirable. For example, in distributed computing applications that serve multiple computing resource customers, such as virtual machines hosting web servers or other cloud computing applications, the frequency of the processor clock or other measure of performance of one or more cores assigned to particular virtual machines may be specified as an absolute minimum, and falling below the specified performance level cannot be permitted. Exceeding the specified performance by too great a margin is also undesirable, as such operation typically wastes power. Further, in some applications, accounting of processor usage may be tied to the processor clock frequency or other performance level metric, which could cause a higher charge for a processor operating at a frequency exceeding a specified operating frequency for a customer's requirements.
Therefore, it would be desirable to provide a control method and system that controls processor performance in a system that has one or more processors individually clocked by an environmentally-adaptive clocking scheme.
The invention is embodied in a method, a computer program product and a computer system, in which the performance of an individual processor core or group of processor cores within a voltage domain is adjusted to obtain a target minimum performance by varying the power supply voltage for the domain according to an outer feedback loop. The processor cores have inner feedback loops that adjust their processor clock frequencies, or other performance control mechanism such as instruction issue rate, to maximize performance under a current set of operating conditions, while maintaining a margin of safety. The computer program product includes program instructions for carrying out the method and the computer system is a system that is managed according to the method.
The method periodically measures the performance of one or more individual cores. Then the performance is compared to a target performance to obtain a performance deviation. The power supply voltages for the voltage domains powering the core(s) are then adjusted to ensure that all cores in each voltage domain are meeting their target performance. The performance measure may be processor clock frequency, or another performance metric such as instruction dispatch/completion rate. The performance measuring may measure an average frequency or other performance metric over a measurement interval, and the method may further estimate required power supply voltages from the computed frequency or other performance metric deviations. Alternatively, the method may adjust the power supply voltages in small increments until the target performance is reached for the cores.
The foregoing and other objectives, features, and advantages of the invention will be apparent from the following, more particular, description of the preferred embodiment of the invention, as illustrated in the accompanying drawings.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives, and advantages thereof, will best be understood by reference to the following detailed description of the invention when read in conjunction with the accompanying Figures, wherein like reference numerals indicate like components, and:
The present invention encompasses techniques for improving power efficiency in processing systems having multiple cores. The cores are cores that each adapt a performance control, such as processor frequency or instruction dispatch rate, to maximize performance within the limits of fail-safe operation according to an inner feedback loop, as determined by an environmental and process monitoring circuit such as a critical path monitor (CPM). The performance of the cores are controlled by an outer feedback loop that determines whether target performances levels are being met by all of the processors in a given voltage domain and if one or more of the processors is not meeting its corresponding target performance level, then the power supply voltage supplied to the given voltage domain is increased. Otherwise, if all of the processor cores in a voltage domain are exceeding their target performance level, the voltage supplied to the voltage domain can be decreased, conserving energy.
Referring now to
The system of
In the depicted example, each of processors 10A-10D represents a single voltage domain for the power supply distribution scheme, although each of processors 10A-10D receives three power supply voltages VDD, VIO and VCS, power supply voltages VDD and VCS are controlled together and represent a single voltage domain for the purposes of the present illustration and power supply voltage VIO is not controlled by the outer feedback loop. Power supply voltage VIO is representative of a power supply voltage that provides a small fraction of the overall power consumption of processors 10A-10D, a power supply voltage for which performance is not a strongly dependent function, a power supply voltage for which performance is not characterized, and/or a power supply voltage that cannot be varied due to interface constraints. Service processors 11A-11D also obtains performance information from cores 20A-20B within each of processors 10A-10D via a service processor interface and also receives target performance levels for cores 20A-20B within each of processors 10A-10D from the system, which are used to determine the voltages generated by VRMs 13A-13D according to the outer feedback loop. While the illustrated system shows a core-to-voltage-domain relationship of 2:1, the voltage domains can be per-core, or may encompass larger numbers of cores. The closer the core-to-voltage-domain relationship is to per-core, the more efficient the system using the illustrated techniques, because when the voltage to each core can be controlled independently, the voltage at each core can be set to the optimum value. Otherwise, some cores within a voltage domain may receive a higher voltage than necessary to achieve their performance targets because some other core(s) in the voltage domain requires the voltage level being demanded by the outer feedback loop.
Referring now to
A clock generator 26 provides an internal clock source for processor core 20, generally using a digital phase-lock loop (DPLL) that multiplies an externally-supplied lower-frequency clock signal by a large factor. In processor core 20, the output processor clock frequency generated by clock generator 26 is controlled by the outputs of CPMs 24 so that fail-safe operation is ensured. CPMs 24 provide a very rapid inner control loop feedback that compensate for rapid drops in power supply voltage due to workload increases, along with rises in temperature, so that fail-safe operation is maintained with a much lower frequency margin, and therefore a higher clock frequency and performance level, than would otherwise be required to ensure fail-safe operation. A workload is a set of instructions executed by processor core 20 and a rate of execution of those instructions, and may include particular data operated on by those instructions, in which the instruction/data mix causes a particular demand for the resources of processor core 20 and their rate of demand, that will vary from workload to workload. While the details of CPMs 24 are as illustrated further below with reference to
Referring now to
Referring now to
Referring now to
The setting of performance levels of cores 40 (or single core 40) within the system is performed by the outer feedback loop illustrated in
Referring now to
Referring now to
While the invention has been particularly shown and described with reference to the preferred embodiments thereof, it will be understood by those skilled in the art that the foregoing and other changes in form, and details may be made therein without departing from the spirit and scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
5726901 | Brown | Mar 1998 | A |
6002878 | Gehman et al. | Dec 1999 | A |
6442700 | Cooper | Aug 2002 | B1 |
6513124 | Furuichi et al. | Jan 2003 | B1 |
6553502 | Hurd et al. | Apr 2003 | B1 |
6704876 | Iacobovici et al. | Mar 2004 | B1 |
6775787 | Greene | Aug 2004 | B2 |
6795927 | Altmejd et al. | Sep 2004 | B1 |
6795928 | Bradley et al. | Sep 2004 | B2 |
6816809 | Circenis | Nov 2004 | B2 |
6820222 | Swoboda | Nov 2004 | B2 |
6829713 | Cooper et al. | Dec 2004 | B2 |
6845456 | Menezes et al. | Jan 2005 | B1 |
6904534 | Koenen | Jun 2005 | B2 |
7017060 | Therien et al. | Mar 2006 | B2 |
7111177 | Chauvel et al. | Sep 2006 | B1 |
7272517 | Brey et al. | Sep 2007 | B1 |
7434083 | Wilson | Oct 2008 | B1 |
7576569 | Carpenter et al. | Aug 2009 | B2 |
7579887 | Friedman et al. | Aug 2009 | B1 |
7607030 | Goodrum et al. | Oct 2009 | B2 |
20020194509 | Plante et al. | Dec 2002 | A1 |
20020194517 | Cohen et al. | Dec 2002 | A1 |
20030065960 | Rusu et al. | Apr 2003 | A1 |
20030126479 | Burns et al. | Jul 2003 | A1 |
20040041538 | Sklovsky | Mar 2004 | A1 |
20040059956 | Chakravarthy et al. | Mar 2004 | A1 |
20040225902 | Cesare et al. | Nov 2004 | A1 |
20050049729 | Culbert et al. | Mar 2005 | A1 |
20050060594 | Barr et al. | Mar 2005 | A1 |
20050218871 | Kang et al. | Oct 2005 | A1 |
20050268189 | Soltis, Jr. | Dec 2005 | A1 |
20060129852 | Bonola et al. | Jun 2006 | A1 |
20060156042 | Desai et al. | Jul 2006 | A1 |
20060230299 | Zaretsky et al. | Oct 2006 | A1 |
20060253715 | Ghiasi et al. | Nov 2006 | A1 |
20060288241 | Felter et al. | Dec 2006 | A1 |
20060294400 | Diefenbaugh et al. | Dec 2006 | A1 |
20070016814 | Rusu et al. | Jan 2007 | A1 |
20070124094 | Brey et al. | May 2007 | A1 |
20080141047 | Riviere-Cazaux | Jun 2008 | A1 |
20080229127 | Felter et al. | Sep 2008 | A1 |
20090312848 | Anderson et al. | Dec 2009 | A1 |
20100017690 | Rylyakov et al. | Jan 2010 | A1 |
Entry |
---|
Wang, et al., “Managing Peak System-Level Power with Feedback Control”, IBM Research Report RC23835, Dec. 2005. |
Weiser, et al.,“Scheduling for Reduced CPU Energy”, “Proceedings of the First Symposium on Operating Systems Design and Implementation,” Usenix Association Nov. 1994. |
Bohrer, et al.: “The Case for Power Management in Web Servers”, IBM Research, Austin TX, 2002. |
Wu, et al., “Cycle-Accurate Macro-Models for RT-Level Power Analysis”, IEEE Transactions on Very Large Scale Integration (VLSI) systems, vol. 6, No. 4, Dec. 1998, pp. 520-528. |
Number | Date | Country | |
---|---|---|---|
20120005513 A1 | Jan 2012 | US |