The field of invention pertains generally to computing systems, and, more specifically, to a computing system and processor with fast power surge detection and instruction throttle down to provide for low cost power supply unit.
In order to provide a “stable” supply voltage to the processor 101, the voltage regulator 102 receives, at input 104, an input voltage that is higher than the supply voltage at supply node 103. For example, modern day voltage regulators that supply a +1.8 V supply voltage can typically accept a voltage anywhere within a range of +4.0 V to +36.0 V at input 104. The voltage regulator 102 therefore “steps down” the voltage received at input 104 (e.g., +12.0 V) to the supply voltage provided at supply node 103 (e.g., +1.8 V). According to one view, the stepping down activity of the voltage regulator 102 permits for a “steady” supply voltage at node 103 in the face of dramatic swings in current draw from the processor 101.
When the processor does draw significant amounts of current, an effect can be observed at input node 104. Specifically, a sudden current draw derived from the increase in power demanded by the processor 101 and the inefficiency of the voltage regulator 102 will be observed at node 104. For example, consider a processor that receives a supply voltage of +1.8 V at supply node 103 and nominally draws a current of 36 Amps (A). A +1.8 V supply voltage and 36 A current draw corresponds to 65 Watts (W) of power dissipation in the processor ((1.8 V)*(36 A)=65 W). The power supply unit 105 will need to supply not only enough power for the processor (65 W) but also additional power to compensate for the less than perfect efficiency of the voltage regulator 102.
For example, if the regulator 102 is 80% efficient, which is presently typical, an additional 20% power increase needs to be provided to the voltage regulator 102 from the power supply unit 105. That is, ((65 W)/0.8)=80 W needs to be provided by the power supply unit 105 to the voltage regulator 102. If the power supply unit 105 feeds a +12 V input voltage to the voltage regulator 102 at node 104, the voltage regulator's current draw from the power supply unit will be ((80 W)/12 V)=6.67 A. (Note that the effect of the step down conversion from +12 V to +1.8 V by the voltage regulator 102 includes comparatively lower current draw demanded by the voltage regulator 102 than the processor 101).
If the processor 101 suddenly increases its current draw demand from 36 A to 56 A, the power supply unit 105 will observe a current draw increase by the voltage regulator 102 from 6.67 A to 10.42 A (assuming the voltage provided by the power supply unit stays fixed at +12 V). That is, the power dissipation in the processor 101 will increase to (56 A)*(1.8 V)=100 W. To account for the less than perfect efficiency of the voltage regulator 102, the power supply unit will need to supply 100 W/0.8=125 W to the voltage regulator 102. Supplying 125 W at +12 V corresponds to 125 W/12 V=10.42 A.
The above analysis bears out that the power supply unit 105, owing to the inefficiency of the voltage regulator 102, is typically designed to supply significantly more power than the processor consumes. Typically, the more power a power supply unit 105 is designed to provide, the larger and more expensive the power supply unit becomes.
A better understanding of the present invention can be obtained from the following detailed description in conjunction with the following drawings, in which:
A problem is that as processor power consumption continues to increase (e.g., due to increasing transistor counts, die size and clock speed), so to does the maximum power rating of the power supply unit 105. Making matters worse is that the maximum power draw of the processor can, in certain rare situations (e.g., an “optimized power virus loop”), far exceed its “typical” maximum power draw (e.g., at its highest performance state under a workload that is more typical of the kind of workloads that cause the processor to enter its highest performance state). For example, the rated Pmax power draw of a processor may be 100% higher than what a processor normally draws when processing the kind of workload that is typical when the processor is operating at its highest performance state.
Here, Pmax is closer to a measure of the theoretical worst case power draw of the processor rather than what a processor will typically draw when asked to perform its largest workloads in a real world application. For example, Pmax may correspond to the power drawn when the processor is asked to process a continuous stream of the most energy consuming instruction(s) at the processor's highest supply voltage and operating frequency. In real world applications, such an instruction stream is unlikely. Nevertheless, systems are designed to handle the Pmax event should it happen. As such, the power supply unit 105 tends to be designed with a size and cost that is well beyond what would otherwise be sufficient under normal operating circumstances.
As such, referring to
The fast power sense circuitry 206 can detect an increase in power draw at the power supply output 207 with specially designed analog and/or digital circuitry that measures, for example, the current draw from the voltage regulator 202, or, the current draw from the voltage regulator 202 and/or the voltage provided by the power supply unit 205.
In response to its fast detection that the power draw from the voltage regulator 202 has exceeded a pre-established threshold, the power sense circuitry 206 raises a fast throttle down signal 208 to the processor 201. The fast throttle down signal 208 is received at an input 211 of the processor 201, and routed through a “quick” signal path 209 within the processor 201 to logic circuitry 210 that controls, in some manner, the rate at which instructions are executed by the instruction execution pipeline(s) 213 within the processor 201. For example, logic circuitry 210 may control the rate at which instructions for the pipeline(s) 213 are fetched (e.g., from cache, system memory or both) and/or the rate at which fetched instructions are fed (issued) to the pipeline(s) 213.
The quick signal path 209 is designed so that the fast throttle down signal 208 endures only a small propagation delay end-to-end from the processor input 211 to logic circuitry 210. Small propagation delay can be effected, for instance, by minimizing the number of logic gates or other types of logic processing between the input 211 and logic circuitry 210. The quick signal path 209 may also be implemented, at least in sections, as a transmission line with controlled (e.g., specifically designed) characteristic impedance to minimize signal distortion as it propagates through the processor.
The transmission line may be driven by a driver circuit having a source impedance that substantially matches the characteristic impedance of the transmission line, and, may be terminated with a termination resistance that matches the characteristic impedance of the transmission line. Conceivably, the end-to-end run length of the quick path 209 may be broken down into a series of transmission line segments, for example, where each segment has its own driver and termination pair as discussed just above.
Essentially, in an embodiment, one or more analog transmission lines are effected to transport the signal, e.g., as quickly as possible, from the input 211 to logic circuitry 210. By so doing, substantial logical processing implemented with logic gates each having an associated, unwanted propagation delay are avoided as much as is practicable. The result is that the fast throttle signal's propagation delay through the processor 201 is reduced so that it reaches logic circuit 210 as fast as practicable.
From the discussion above, emphasis is therefore made to reduce the overall propagation delay through the power sense circuitry 206 and along the quick signal path 209 within the processor 201. By so doing, logic circuitry 210 causes the instruction execution pipeline(s) 213 to reduce the rate at which instructions are executed “almost immediately” after a power draw exceeding a threshold for the power supply unit 205 occurs.
Here, the more the propagation delay through these circuits 206, 209 is reduced, effectively, the smaller and cheaper the power supply unit 205 is permitted to be. As alluded to above, a power supply unit 205 can typically handle a “power surge” beyond its rated maximum for a brief moment of time—but not a sustained period of time. By designing into the system a closed loop response that quickly reduces the power draw of the processor 201 within the time window that the power supply unit 205 can supply power beyond its pre-established threshold, a larger more expensive power supply designed to handle extreme power surges over sustained periods of time need not be designed into the system. As such, the system can “get away with” using a lower performance power supply unit 205.
After time window 350, the processor suddenly approaches a worst case Pmax power draw state. The voltage regulator power draw 302 surges in response. During the surge, the power draw from the voltage regulator surpasses the threshold 320 for the power supply unit 305. Shortly thereafter, the power sense circuit raises the fast throttle down signal 306 which quickly propagates through the processor and reaches logic that begins to throttle down the instruction issue rate 330. The processor power draw 301 begins to drop in response 331 and ultimately causes the power draw from the voltage regulator 302 to fall 332 below the threshold 320.
Viewing any voltage regulator power draw beneath threshold 320 as a power draw that the power supply unit can handle for a sustained period of time, and, any power draw above threshold 320 as a power draw that the power supply unit cannot handle for a sustained period of time, but can handle for a brief period of time, note that the fast action of the power sense circuit and low propagation delay path through the processor causes the power supply draw from the voltage regulator 302 to exceed the power supply unit's threshold level 320 for only a brief period of time 323. As such, a power supply unit that is not capable of satisfying a sustained power draw when the processor is drawing at its Pmax level (and, e.g., is only capable of satisfying a sustained power draw at or below threshold level 320) can nevertheless be implemented in the system.
In an embodiment, the brief amount of time that the smaller and/or less expensive power supply unit can provide power when above its threshold level 320 is about 100 μs. Thus, in an embodiment, time period 323 should be less than 100 μs. High performance sense circuitry should be able to achieve sense times within a 1-10 μs range.
In one embodiment, a 40 μs time budget is specified for time period 324. Here, it should take 40 μs from the moment the power draw of the voltage regulator 302 surpasses threshold 320 to the moment the power draw from the voltage regulator 302 begins to decrease. According to one approach, the total time budget is approximately split between the power sense circuit and the processor. As such, the power sense circuit is allocated 20 μs to raise the fast throttle down signal after the voltage regulator power draw surpasses threshold 320, and, the processor is allocated 20 μs to begin reducing its power consumption after it first receives the fast throttle down signal (note
In an embodiment, the threshold level 320 established for the power supply unit is no lower than what the power draw on the power supply unit is expected to be when the processor is in its highest performance state and is processing a workload that is typical of the kinds of workloads that are processed by the processor in its highest performance state (or some percentage, e.g., 10%, beyond such a power draw). In another or related embodiment, the threshold 320 is no higher than a power that would be drawn if the processor were drawing at its Pmax level. In many embodiments threshold level 320 would be significantly beneath this level.
In order to assist system designers, in an embodiment, the processor's published specifications articulate a fast throttle down signal response that specifies the propagation delay from the moment the processor receives the fast throttle down signal to the moment the processor begins to reduce its power draw. In a further embodiment, the published specifications also specify a rate or envelope at which the power draw decays or other similar information. For example, the published specification may specify one or more propagation delays that specify the amount of time, after assertion of the fast throttle down signal at the processor input, for the processor's power draw to fall from the Pmax level to one or more lower levels.
With this type of information, system designers can determine the appropriate voltage regulator response times and power draws and the power sense circuit response times for any particular power supply unit threshold level 320. The power supply unit threshold level 320 essentially determines the size and/or cost of the power supply. That is, smaller and/or cheaper power supply units will have lower threshold levels 320 than larger and/or more expensive power supply units. As such, the more motivated a designer is to integrate a smaller and/or less expensive power supply unit into the system, the designer is correspondingly motivated to integrate a faster power sense circuit 206 and voltage regulator 202.
In a further embodiment, the “throttled down” instruction issue rate of the instruction execution pipeline(s) that result in response to an asserted fast throttle down signal is a programmable feature of the processor. This permits system designer control of the rate at which the processor will reduce its power consumption once the fast throttle down signal has been asserted. For example, the processor may include model specific register (MSR) space that permits an Operating System (OS) instance or Virtual Machine Monitor (VMM) to set a value in the MSR space that sets a maximum limit on the number of instructions that can be fetched and/or issued per unit of time. Not that a limit on instruction fetch into the pipeline essentially limits instruction issue. As such instruction issue will be used to refer to both mechanisms.
A lower limit will cause the processor's power consumption to fall more rapidly once the fast throttle down signal has been asserted than a higher limit. Permitting the system designer to specify the power reduction rate of the processor in response to an assertion of the fast throttle down signal should provide the system designer with additional flexibility in terms of defining an appropriate voltage regulator, power sense circuit and power supply unit. In an embodiment, the specifications for the processor also specify different power reduction rates of the processor for different programmed reduced instruction fetch and/or issue rate values.
According to another approach, once the fast throttle down signal is asserted, the instruction execution pipelines stop issuing instructions so that the processor effectively stops further processing activity and instantaneously drops its power draw at a maximum or near maximum rate. Complete cessation may be hardwired into the processor by fixed design, or, the user may be able to program a value of 0 instructions fetched/issued per unit time in, e.g., MSR space.
Regardless of the rate at which instruction issuance is throttled down, different design options also exist as to how to exit the throttled down mode after it is entered. According to a first approach, the throttled down mode exists for a fixed time period and then switches over to an established performance state of the processor. In an embodiment the performance state is not the highest performance state. Entry into a performance state that is lower than the highest performance state should force at least one of a supply voltage and/or a clock frequency of the processor to be reduced compared to the voltage/frequency that existed prior to the processor's reception of the fast throttle down signal.
In another embodiment, the time period that the processor spends in throttle down mode is programmable. That is, for example, an OS instance or VMM may enter a value in MSR space that establishes how long the processor is to remain in throttle down mode once the mode is entered. In a further or alternate embodiment, the specific performance state that the processor switches over to when coming out of the throttle down state can also be programmed into the processor in, e.g., MSR space.
In an even further embodiment, reception of the fast throttle down signal causes an interrupt or other kind of warning flag to be raised to software (e.g., OS instance or VMM) so that the, for example, the instruction sequence that caused the power surge can be branched out of, or, processed in a lower performance state. Either or both of these reactions can be imposed by way of software control through appropriate registers. Here, the processor may be designed to include logic circuitry that raises the interrupt or flag in response to the processor's reception of the fast throttle down signal.
It is believed that the software processes discussed above may be performed with a processor, controller, micro-controller or similar component. As such these processes may be implemented with program code such as machine-executable instructions that cause a machine that executes these instructions to perform certain functions. These processes may also be performed by (in the alternative to the execution of program code or in combination with the execution of program code) by electronic circuitry designed to perform the processes (or a portion thereof).
It is believed that any software processes may be described in source level program code in various object-orientated or non-object-orientated computer programming languages. An article of manufacture such as a computer readable medium may be used to store program code. An article of manufacture that stores program code may be embodied as, but is not limited to, one or more memories (e.g., one or more flash memories, random access memories (static, dynamic or other)), optical disks, CD-ROMs, DVD ROMs, EPROMs, EEPROMs, magnetic or optical cards or other type of machine-readable media suitable for storing electronic instructions. Program code may also be downloaded from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a propagation medium (e.g., via a communication link (e.g., a network connection)).
The memory controller 604 reads/writes data and instructions from/to system memory 606. The I/O hub 605 manages communication between the processor and “I/O” devices (e.g., non volatile storage devices and/or network interfaces). Port 607 stems from the interconnection network 602 to link multiple processors so that systems having more than N cores can be realized. Graphics processor 608 performs graphics computations. Power management circuitry 609 manages the performance and power states of the processor as a whole (“package level”) as well as aspects of the performance and power states of the individual units within the processor such as the individual cores. Other functional blocks of significance (e.g., phase locked loop (PLL) circuitry) are not depicted in
In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Number | Name | Date | Kind |
---|---|---|---|
5664202 | Chen et al. | Sep 1997 | A |
6219796 | Bartley | Apr 2001 | B1 |
6304978 | Horigan et al. | Oct 2001 | B1 |
6363490 | Senyk | Mar 2002 | B1 |
6415388 | Browning et al. | Jul 2002 | B1 |
6564328 | Grochowski et al. | May 2003 | B1 |
6704877 | Cline et al. | Mar 2004 | B2 |
20030126479 | Burns et al. | Jul 2003 | A1 |
20040064745 | Kadambi | Apr 2004 | A1 |
20060288241 | Felter et al. | Dec 2006 | A1 |
20080104436 | Sawyers et al. | May 2008 | A1 |
20080168287 | Berry | Jul 2008 | A1 |
20090049316 | Khatri | Feb 2009 | A1 |
20100077237 | Sawyers | Mar 2010 | A1 |
20140068311 | Jenne | Mar 2014 | A1 |
Entry |
---|
PCT International Search Report for PCT Counterpart Application No. PCT/US2013/046655, 5 pages, (Oct. 18, 2013). |
PCT Written Opinion of the International Searching Authority for PCT Counterpart Application No. PCT/US2013/048655, 7 pages. |
PCT/US2013/048655 Notification Concerning Transmittal of International Preliminary Report on Patentability, dated Apr. 9, 2015, 9 pages. |
First Office Action from foreign counterpart China Patent Application No. 201380039994.1, dated Dec. 1, 2016, 23 pages. |
Number | Date | Country | |
---|---|---|---|
20140095905 A1 | Apr 2014 | US |