The inventions generally relate to hardware proactive implementation for active mode power control of platform resources.
The basic (or uncoupled) power control of a resource (for example, a link, DRAM rank, processor core, etc.) typically proceeds in a manner that uses a low power state. When the resource is found to remain idle for some period of time, it is transitioned to a low power state where it stays either until the arrival of new requests for the resource or until some predetermined time expires.
The inventions will be understood more fully from the detailed description given below and from the accompanying drawings of some embodiments of the inventions which, however, should not be taken to limit the inventions to the specific embodiments described, but are for explanation and understanding only.
Some embodiments of the inventions relate to hardware proactive implementation for active mode power control of platform resources.
In some embodiments, estimating a duration of an idle period gap of a lower power state of a resource by exponentially smoothing successive idle period gaps.
In some embodiments, a reduction is made in power consumption of components of a platform (e.g., links, memory, processing cores, etc.) by opportunistically putting them in a low power state. A component can be put in the low power state when not in use (generally, when the queue of requests for it drains out), and it must be brought back to the normal power state before the next use. For example, a link can be put in a low power state when there is nothing to transmit, but must transition to a normal power state on or prior to the arrival of a new transmit request. Both entry into the low power state and exit out of it take some finite amount of time, which is henceforth referred to as a “cost” or “latency”. If the exit from the low power state is affected when a new request for the component arrives, the new request will encounter the full exit latency. This additional latency generally results in degraded performance. Hence, the purpose of the power control algorithm is to decide when to transition the component into and out of the low power state so that it can fulfill two conflicting goals: (a) maximize power savings, and (b) minimize latency (or performance) impact. The word “resource” is used herein for any independently power controlled component.
The uniqueness of the implementation according to some embodiments is that it is proactive in nature, in that it predicts the duration of non-use of the resource and attempts to transition to the normal power state ahead of the new request arrival. At the same time, the implementation has very low complexity and is appropriate for direct hardware (HW) implementation and high speed operation. In some embodiments, the proactive nature of the implementation is especially useful in combating the effects of rather high entry and exit latencies that are currently encountered in some of the key platform resources for which it is intended.
In some embodiments, the implementation can be applicable to one or more of the following (but not limited to the following: (a) Power control of processor-memory links (Quickpath™ and its successors), (b) Power control of industry standard PCI-E links connecting IO devices to the IO hub, (c) Power control of derivatives of PCI-E used in the chipset (e.g., DMI), (d) Power control of FBD links used with FBD memory, (f) All forms of CKE (clock-enable) control of DRAM power, which may operate at the level of DRAM ranks, DIMMs or memory channels, and (g) C-state control of individual CPU cores.
In some embodiments, an ESA (exponential smoothing algorithm) is used to estimate the gap via an exponential smoothing of successive gaps. To explain exponential smoothing, let Gn denote the running estimate of the gap size at nth step and let gn denote the actual gap encountered. Then the running estimate Gn is estimated as follows:
G
n=(1−α)Gn-1+αgn
where αε[0 . . . 1] is the smoothing constant which specifies how much weight a new sample gets.
Successive gaps can vary widely and a simple exponentially smoothed estimate is inadequate. Instead, ESA uses a biased exponential estimate specified as follows:
if gn≦Gn-1
In some embodiments, R0 denotes the runway (or wait and watch period) and ηe and ηx denote, respectively, the entry and exit costs. Both ηe and ηx are fixed and known for a given resource, and R0 is known when preparing to go into low power mode. Then, the predicted residence time in low power state, denoted L, is given by:
L
n
=G
n
−R
0−ηe−ηx
If the predicted Ln is negative or too small, the algorithm may decide to change the runway R0 for subsequent gaps as discussed below.
In some embodiments, ESA is actually a two-phase algorithm and uses a separate runway in each phase. The first phase is as shown at 106 in
In some embodiments, the control of runways R0 and R1 is an integral part of ESA. Several ways of varying R0 were attempted, of which two are explicitly stated herein and lead to two different flavors of ESA. The simpler version (referred to as basic ESA) is most appropriate for HW implementation and controls R0 as follows:
if Ln<1 then R0=W0(1+Limit/8) else R0=W0
where W0=(ηe+ηx)/2 is the minimum runway setting and Limit is the only user-settable control parameter used by ESA. The main idea of the above equation is to increase the wait and watch period if our current estimate indicates that we might be going into the low power state for too short a period. The runway R1 for the second phase of basic ESA is set as R1=W0×Limit. Notice that R1 is substantially larger than R0 so that low power mode entry in the second phase is very infrequent. This is essential in order to effectively control the latency impact of the reactive exit.
The advanced ESA version bases its R0 on an estimate of the current utilization of the resource. The utilization is estimated from exponentially smoothed estimates of the gaps (or off times) as well as busy periods (or on times). The on-time estimate, henceforth denoted as Bn, uses a single smoothing constant γ, i.e.,
B
n=(1−γ)Bn-1+γbn
where bn is the nth busy period. With this, the resource utilization U can be estimated as
U
n
=B
n/(Bn+Gn)
The smoothing constant γ is chosen as roughly the geometric mean of α and β. The runway R0 is then made roughly proportional to the utilization Un. It is noted that there is no need for an explicit division operation, and even the multiplication can be avoided by keeping the proportionality approximate. Such a setting would make the algorithm more cautious at high utilizations (so as to minimize latency impact of wrongful entry into low power state) and more aggressive at low utilizations (so as to maximize time spent in low power state).
In some embodiments, the parameter Limit controls the tradeoff between power savings and latency impact of the algorithm. It turns out that it is impossible to achieve both high power savings and low latency impact. The Limit parameter allows the algorithm to operate in the desired region.
In almost all the contexts mentioned above (with the possible exception of processor core C-state control), the algorithm must operate at very high speed and must be easily implemented in hardware. In order to facilitate this, both α and β must be chosen as negative powers of 2. This is generally an adequate granularity in practice. Similarly, Limit must be a power of 2. With this, all computations can be implemented with only integer add, subtract and shifts. The other important issues are that of precision and range. The gap and busy period measurements are typically in the units of an underlying clock, and may have a limited range. For example, if the gap sizes are limited to 65535 clock units, the exponential smoother will be insensitive to larger gaps. The calculations required for exponential smoothing must be done using somewhat larger number of bits in order to control truncation errors. It is found that an additional 4-bits of precision is adequate for implementing ESA well.
Both simple and advanced ESA algorithms have been evaluated via detailed simulations as illustrated, for example in
Detailed simulations have been used to select Limit parameter for Quickpath, PCIE and CKE applications of ESA.
The x-axis in both graphs 200 and 300 of
In some embodiments, a proactive approach is used that is crucial to obtaining power savings without a large latency impact when the low power state entry and exit costs are large. The latter situation applies at least to Quickpath, PCIE, FBD, and is unlikely to change in near future. The present inventors believe that the current state of the art is a reactive algorithm (for example, for Quickpath and PCIE). The parameters for this algorithm have been set such that the algorithm will not come into effect except at extremely low utilizations. This is because of its latency impact concerns. In contrast, in some embodiments the proactive algorithm prescribed here can be used much more aggressively.
In some embodiments, operational statistics and support interfaces for an external power control engine (e.g., power control unit or PCU) may be used to control the ESA engine. In some embodiments, the ESA is on a landing zone (runway) for power control of Quickpath (CSI), DMI, PCIE, and/or CKE. ESA can also be used in some embodiments for other applications (for example, for C-state control of processor cores).
In some embodiments, the invention may be used by other vendors in a variety of power control contexts. This includes use, for example, by processor vendors, as well as use by other HW vendors (e.g., graphics HW manufacturers) and OS vendors (for OS directed power control of devices). In some embodiments, any power control algorithm may be implemented that uses some variant of exponential smoothing to predict idle duration, and may be used to do power control.
Although some embodiments have been described herein as being included in particular implementations, according to some embodiments these particular implementations may not be required.
Although some embodiments have been described in reference to particular implementations, other implementations are possible according to some embodiments. Additionally, the arrangement and/or order of circuit elements or other features illustrated in the drawings and/or described herein need not be arranged in the particular way illustrated and described. Many other arrangements are possible according to some embodiments.
In each system shown in a figure, the elements in some cases may each have a same reference number or a different reference number to suggest that the elements represented could be different and/or similar. However, an element may be flexible enough to have different implementations and work with some or all of the systems shown or described herein. The various elements shown in the figures may be the same or different. Which one is referred to as a first element and which is called a second element is arbitrary.
In the description and claims, the terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Rather, in particular embodiments, “connected” may be used to indicate that two or more elements are in direct physical or electrical contact with each other. “Coupled” may mean that two or more elements are in direct physical or electrical contact. However, “coupled” may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
An algorithm is here, and generally, considered to be a self-consistent sequence of acts or operations leading to a desired result. These include physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers or the like. It should be understood, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities.
Some embodiments may be implemented in one or a combination of hardware, firmware, and software. Some embodiments may also be implemented as instructions stored on a machine-readable medium, which may be read and executed by a computing platform to perform the operations described herein. A machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, the interfaces that transmit and/or receive signals, etc.), and others.
An embodiment is an implementation or example of the inventions. Reference in the specification to “an embodiment,” “one embodiment,” “some embodiments,” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the inventions. The various appearances “an embodiment,” “one embodiment,” or “some embodiments” are not necessarily all referring to the same embodiments.
Not all components, features, structures, characteristics, etc. described and illustrated herein need be included in a particular embodiment or embodiments. If the specification states a component, feature, structure, or characteristic “may”, “might”, “can” or “could” be included, for example, that particular component, feature, structure, or characteristic is not required to be included. If the specification or claim refers to “a” or “an” element, that does not mean there is only one of the element. If the specification or claims refer to “an additional” element, that does not preclude there being more than one of the additional element.
Although flow diagrams and/or state diagrams may have been used herein to describe embodiments, the inventions are not limited to those diagrams or to corresponding descriptions herein. For example, flow need not move through each illustrated box or state or in exactly the same order as illustrated and described herein.
The inventions are not restricted to the particular details listed herein. Indeed, those skilled in the art having the benefit of this disclosure will appreciate that many other variations from the foregoing description and drawings may be made within the scope of the present inventions. Accordingly, it is the following claims including any amendments thereto that define the scope of the inventions.