Modern processors include significant amounts of circuitry and operate at ever increasing frequencies. The trend in processor design is towards multicore designs in which multiple independent processor cores are present on one or more semiconductor die of a processor package. By providing multiple processor cores, often of relatively simple design, workloads can be efficiently split up and executed in parallel on the different processors.
To provide power savings, when a given processor core is not being used, it can be placed into a low power state. When all processors of a given package are in a low power state, the package itself can be placed into a low power state in which deeper power savings are available, however this comes at the expense of greater latency in exiting from such low power state.
To maintain a measure of core temperature, one or more cores may have a thermal sensor associated with it. This temperature information is used in part to determine an appropriate frequency and voltage at which to operate the core. Such temperature information is used when a core wakes from a low power state to enable an appropriate voltage (and frequency) to be provided to the core. Stale temperature information obtained prior to the low power state may be unsuitable for determining an appropriate voltage at which to operate the core. Thus many processors maintain a thermal sensor powered on even when the corresponding core is in a low power state. However, this reduces the benefit of the low power state, and also prevents entry into certain deeper low power states.
In various embodiments, a temperature of a core of a multicore processor that is in a low power state can be estimated. By way of this estimation, determinations can be made as to appropriate voltages and/or frequencies at which to operate the core when it exits the low power state. As process technology evolves, in modern processors, the voltage used to operate processor circuits at a certain frequency is strongly dependent on the operating temperature. By estimating the temperature of a core even when it is powered down, circuit functionality can be guaranteed when the core is brought out of a low power state. That is, for devices fabricated in some semiconductor processes, a reverse temperature coefficient exists such that as the die cools, a higher voltage is needed to run at a given frequency. If an estimation such as made available in accordance with an embodiment of the present invention is not available, upon waking, a core may be assumed to be at a higher temperature than it actually is, and may be provided an insufficient voltage to run its circuitry, potentially leading to a speed path failure.
Furthermore, a scheduling algorithm that takes into account temperature information can make a scheduling decision based on this information. In addition, embodiments may further estimate the temperature of multiple cores of a multicore processor when these multiple cores are in a low power state. For example, two cores of a multicore processor both can be in a low power state and the temperature of each can be estimated. Still further, when all cores of a multicore processor are in a low power state (such as in a given package low power state), the temperature of all cores can be estimated.
Although the scope of the present invention is not limited in this regard, some embodiments may be implemented in logic of a power control unit (PCU) or other uncore or system agent circuitry of a multicore processor. Such PCU is a central control unit in the processor that receives temperature information from thermal sensors, estimates leakage and prescribes an operating voltage for the cores. In this way, temperature estimations can be made while one or more cores are in a low power state. In addition, one or more thermal sensors associated with these cores may also be in a low power state. As a result, greater power savings can be achieved by enabling both cores and their associated thermal sensors to be in a low power state. Nevertheless, valid temperature information can still be determined to enable appropriate scheduling, voltage, frequency and other decisions to be made.
Referring to
Method 100 thus can be performed by logic of a PCU when a temperature estimate is to be performed for a core. As seen, at block 110 a stored temperature of the core and all other cores can be obtained. Such information can be obtained from a thermal storage area associated with each core. This storage area can be an entry of a temperature memory, which can include entries for each core to store temperature and related information. Next at diamond 120 it can be determined whether the low power core is hotter than all other cores. If not, a temperature of the low power core can be estimated to an increased value (block 130). The temperature for the low power core can be estimated using a charging equation, as will be described further below. This occurs as the higher temperature of the other cores can cause the temperature of the low power core to also increase due to thermal coupling of the cores.
Otherwise if the low power core is hotter than all other cores, control passes to block 140 where a temperature of the low power core can be estimated to a decreased value. As an example, this decreased value can be estimated using a decay equation as will be discussed below. From both of blocks 130 and 140, control passes to block 150 where the estimated temperature can be stored in a thermal storage area associated with the low power core. Although described with this particular implementation in the embodiment of
Referring now to
As seen, the analysis for a given core can begin with a determination at diamond 215 as to whether a valid indicator for the core is set. This indicator may be used to indicate that a valid thermal sensor reading has been recently obtained for that core, namely that the core and corresponding thermal sensor are in an active state. In one embodiment, this valid indicator can be stored in the thermal storage area associated with the core, e.g., as a valid bit that is stored in an entry for this core along with temperature information, namely temperature data obtained from the thermal sensor and/or a calculated or estimated temperature generated by the logic. In another embodiment, the valid indicator may be a bit per thermal sensor (or core) that can be maintained in the PCU and used to track whether the thermal information coming from the sensor in a core is valid or not. If the thermal valid bit is cleared, the data coming from that particular sensor is not valid. If a valid indicator is present, control passes to block 220 where the thermal sensor data can be read from the entry for the core. This information may be obtained from a thermal sensor, which in one embodiment is coupled to the PCU via a dedicated interconnect, e.g., a push bus. In many implementations, each core can include at least one thermal sensor to measure core temperature and transmit this information, e.g., to the PCU. Each thermal sensor has a “scan time,” which is the time it takes for the sensor to estimate temperature accurately from the time it is turned on. After this scan time, a thermal sensor can continuously feed temperature data to the PCU. In one embodiment, the push bus acts as an interface between the thermal sensor and the PCU through which the sensor communicates temperature information to the PCU. In one embodiment, the PCU periodically reads the push bus for temperature information, e.g., at an interval of every 128 microseconds (μs).
Control next passes to block 230 where a temperature of the core can be calculated based on this thermal sensor data and a stored temperature for this core, e.g., obtained from a thermal storage area associated with the core. Various manners of calculating a core temperature can be realized in different embodiments. In some implementations, the thermal sensor data obtained from the thermal sensor can be directly stored as the calculated temperature. In other embodiments, various calculations can be performed. For example, an averaging process can occur by averaging the thermal sensor data with a value stored in the thermal storage area, which in turn may be an average of previous readings of the thermal sensor data. Thus a moving average temperature can be maintained over the last N thermal sensor readings from each of the thermal sensor. The value of N can be configurable based on how much smoothing is desired and how much dithering there is in the thermal sensors themselves. In any event, control next passes to block 235 where this calculated/estimated temperature can be recorded for the core. As an example, this value can be stored in the thermal storage area associated with this core. As seen, control iterates back to block 210 for performing similar operations for a next core.
In general, the operations discussed above from block 210 through block 235 may be the ordinary processing path for determining core temperature when a core and associated thermal sensor are in an active state. In contrast, when a core is in a low power state (and its corresponding thermal sensor is also in a low power state), there will be no thermal sensor data received. Accordingly, method 200 may further provide for an estimation of a core temperature when the core and corresponding thermal sensor are in a low power state.
As seen in
Still referring to
As further shown in
By way of obtaining of this information via the low power state delay mechanism, further temperature estimations for cores in a low power state can be performed. Thus if all the cores in a processor are in a low power state, the thermal valid indicators are cleared and there is no reliable thermal information provided from the thermal sensors to the PCU. By the forced delay into a package low power state, a thermal run-away scenario can be avoided, which can occur when the package is rapidly transitioning into and out of a package C0 state and every individual package C0 duration is shorter than the scan time interval. This is so, since when every individual package C0 duration is shorter than the scan time interval, the PCU does not receive an updated thermal sensor reading and is either charging or decaying every core's temperature estimate. As a result, the last actual thermal reading from the sensor could have been sent to the PCU a long time ago and the actual die temperature could have drifted a long way away from the PCU estimate. Thus at block 290 to avoid this drift, the PCU delays entry into deep core or package C6 state until a valid thermal sensor reading from at least one of the cores is available, thereby allowing the PCU to catch up to actual die temperature. Although shown with this particular set of operations in the embodiment of
As seen above, to effect the various calculations, an analysis of whether a given core has a valid temperature reading is determined by way of valid indicators. Such valid indicators can be controlled by logic of the PCU. Referring now to
Still referring to
Thus embodiments provide a means of tracking die temperature accurately without having to keep the thermal sensors always powered on. In this way, leakage current can be accurately estimated and an operating voltage set accurately when the cores seek entry into an active power state. At the same time, thermal sensors can be powered off when a corresponding core is in a low power state, thus minimizing the idle power of the processor.
Referring now to
In various embodiments, power control unit 455 may include a thermal estimation logic 459, which may be a logic to perform thermal estimation of one or more cores that are in a low power state. In the embodiment of
With further reference to
Referring now to
Note that while only shown with three domains, understand the scope of the present invention is not limited in this regard and additional domains can be present in other embodiments. For example, multiple core domains may be present each including at least one core. In this way, finer grained control of the amount of processor cores that can be executing at a given frequency can be realized.
In general, each core 510 may further include low level caches in addition to various execution units and additional processing elements. In turn, the various cores may be coupled to each other and to a shared cache memory formed of a plurality of units of a last level cache (LLC) 5400-540n. In various embodiments, LLC 550 may be shared amongst the cores and the graphics engine, as well as various media processing circuitry. As seen, a ring interconnect 530 thus couples the cores together, and provides interconnection between the cores, graphics domain 520 and system agent circuitry 550.
In the embodiment of
As further seen in
Embodiments may be implemented in many different system types. Referring now to
Still referring to
Furthermore, chipset 690 includes an interface 692 to couple chipset 690 with a high performance graphics engine 638, by a P-P interconnect 639. In turn, chipset 690 may be coupled to a first bus 616 via an interface 696. As shown in
Embodiments may be implemented in code and may be stored on a non-transitory storage medium having stored thereon instructions which can be used to program a system to perform the instructions. The storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, solid state drives (SSDs), compact disk read-only memories (CD-ROMs), compact disk rewritables (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs) such as dynamic random access memories (DRAMs), static random access memories (SRAMs), erasable programmable read-only memories (EPROMs), flash memories, electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, or any other type of media suitable for storing electronic instructions.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.
Number | Name | Date | Kind |
---|---|---|---|
5163153 | Cole et al. | Nov 1992 | A |
5287292 | Kenny et al. | Feb 1994 | A |
5522087 | Hsiang | May 1996 | A |
5590341 | Matter | Dec 1996 | A |
5621250 | Kim | Apr 1997 | A |
5931950 | Hsu | Aug 1999 | A |
6748546 | Mirov et al. | Jun 2004 | B1 |
6792392 | Knight | Sep 2004 | B1 |
6823516 | Cooper | Nov 2004 | B1 |
6829713 | Cooper et al. | Dec 2004 | B2 |
6908227 | Fusu et al. | Jun 2005 | B2 |
6983389 | Filippo | Jan 2006 | B1 |
6996728 | Singh | Feb 2006 | B2 |
7010708 | Ma | Mar 2006 | B2 |
7043649 | Terrell | May 2006 | B2 |
7093147 | Farkas et al. | Aug 2006 | B2 |
7111179 | Girson et al. | Sep 2006 | B1 |
7146514 | Kaushik et al. | Dec 2006 | B2 |
7194643 | Gonzalez et al. | Mar 2007 | B2 |
7263457 | White et al. | Aug 2007 | B2 |
7272730 | Acquaviva et al. | Sep 2007 | B1 |
7412615 | Yokota et al. | Aug 2008 | B2 |
7434073 | Magklis et al. | Oct 2008 | B2 |
7437270 | Song et al. | Oct 2008 | B2 |
7454632 | Kardach et al. | Nov 2008 | B2 |
7529956 | Stufflebeam | May 2009 | B2 |
7539885 | Ma | May 2009 | B2 |
7574321 | Kernahan et al. | Aug 2009 | B2 |
7596464 | Hermerding et al. | Sep 2009 | B2 |
7603577 | Yamaji et al. | Oct 2009 | B2 |
7624215 | Axford et al. | Nov 2009 | B2 |
7730340 | Hu et al. | Jun 2010 | B2 |
7752467 | Tokue | Jul 2010 | B2 |
7797512 | Cheng et al. | Sep 2010 | B1 |
7949887 | Gunther et al. | May 2011 | B2 |
7966511 | Naveh et al. | Jun 2011 | B2 |
8015427 | Miller et al. | Sep 2011 | B2 |
8370551 | Ohmacht et al. | Feb 2013 | B2 |
8407319 | Chiu et al. | Mar 2013 | B1 |
8601288 | Brinks et al. | Dec 2013 | B2 |
20010044909 | Oh et al. | Nov 2001 | A1 |
20020194509 | Plante et al. | Dec 2002 | A1 |
20030061383 | Zilka | Mar 2003 | A1 |
20040030940 | Espinoza-Ibarra et al. | Feb 2004 | A1 |
20040064752 | Kazachinsky et al. | Apr 2004 | A1 |
20040098560 | Storvik et al. | May 2004 | A1 |
20040139356 | Ma | Jul 2004 | A1 |
20040268166 | Farkas et al. | Dec 2004 | A1 |
20050022038 | Kaushik et al. | Jan 2005 | A1 |
20050033881 | Yao | Feb 2005 | A1 |
20050046400 | Rotem | Mar 2005 | A1 |
20050132238 | Nanja | Jun 2005 | A1 |
20050223258 | Watts | Oct 2005 | A1 |
20050288886 | Therien et al. | Dec 2005 | A1 |
20060006166 | Chen et al. | Jan 2006 | A1 |
20060041766 | Adachi | Feb 2006 | A1 |
20060050670 | Hillyard et al. | Mar 2006 | A1 |
20060053326 | Naveh et al. | Mar 2006 | A1 |
20060059286 | Bertone et al. | Mar 2006 | A1 |
20060069936 | Lint et al. | Mar 2006 | A1 |
20060117202 | Magklis et al. | Jun 2006 | A1 |
20060184287 | Belady et al. | Aug 2006 | A1 |
20070005995 | Kardach et al. | Jan 2007 | A1 |
20070016817 | Albonesi et al. | Jan 2007 | A1 |
20070033425 | Clark | Feb 2007 | A1 |
20070079294 | Knight | Apr 2007 | A1 |
20070101174 | Tsukimori et al. | May 2007 | A1 |
20070106428 | Omizo et al. | May 2007 | A1 |
20070106827 | Boatright et al. | May 2007 | A1 |
20070156992 | Jahagirdar | Jul 2007 | A1 |
20070168151 | Kernahan et al. | Jul 2007 | A1 |
20070214342 | Newburn | Sep 2007 | A1 |
20070234083 | Lee | Oct 2007 | A1 |
20070239398 | Song et al. | Oct 2007 | A1 |
20070245163 | Lu et al. | Oct 2007 | A1 |
20070260895 | Aguilar et al. | Nov 2007 | A1 |
20080005603 | Buch et al. | Jan 2008 | A1 |
20080028240 | Arai et al. | Jan 2008 | A1 |
20080028778 | Millet | Feb 2008 | A1 |
20080077282 | Hartman et al. | Mar 2008 | A1 |
20080077813 | Keller et al. | Mar 2008 | A1 |
20080104425 | Gunther et al. | May 2008 | A1 |
20080136397 | Gunther et al. | Jun 2008 | A1 |
20080250260 | Tomita | Oct 2008 | A1 |
20080307240 | Dahan et al. | Dec 2008 | A1 |
20090006871 | Liu et al. | Jan 2009 | A1 |
20090070605 | Nijhawan et al. | Mar 2009 | A1 |
20090150695 | Song et al. | Jun 2009 | A1 |
20090150696 | Song et al. | Jun 2009 | A1 |
20090158061 | Schmitz et al. | Jun 2009 | A1 |
20090158067 | Bodas et al. | Jun 2009 | A1 |
20090172375 | Rotem et al. | Jul 2009 | A1 |
20090172428 | Lee | Jul 2009 | A1 |
20090235105 | Branover et al. | Sep 2009 | A1 |
20090235108 | Gold et al. | Sep 2009 | A1 |
20090271141 | Coskun et al. | Oct 2009 | A1 |
20090271646 | Talwar et al. | Oct 2009 | A1 |
20090313489 | Gunther et al. | Dec 2009 | A1 |
20100058078 | Branover et al. | Mar 2010 | A1 |
20100115309 | Carvalho et al. | May 2010 | A1 |
20100146513 | Song | Jun 2010 | A1 |
20100191997 | Dodeja et al. | Jul 2010 | A1 |
20100250856 | Owen et al. | Sep 2010 | A1 |
20100332927 | Kurts et al. | Dec 2010 | A1 |
20110022865 | Gunther et al. | Jan 2011 | A1 |
20110072429 | Celeskey et al. | Mar 2011 | A1 |
20110093733 | Kruglick | Apr 2011 | A1 |
20110154090 | Dixon et al. | Jun 2011 | A1 |
20110191607 | Gunther et al. | Aug 2011 | A1 |
20110283124 | Branover et al. | Nov 2011 | A1 |
20120053897 | Naffziger | Mar 2012 | A1 |
20120066535 | Naffziger | Mar 2012 | A1 |
20120096288 | Bates et al. | Apr 2012 | A1 |
20120110352 | Branover et al. | May 2012 | A1 |
20120114010 | Branch | May 2012 | A1 |
20120116599 | Arndt et al. | May 2012 | A1 |
20120173907 | Moses et al. | Jul 2012 | A1 |
20130061064 | Ananthakrishnan et al. | Mar 2013 | A1 |
20130080795 | Sistla et al. | Mar 2013 | A1 |
20130080804 | Ananthakrishan et al. | Mar 2013 | A1 |
20130111121 | Ananthakrishnan et al. | May 2013 | A1 |
20130111226 | Ananthakrishnan et al. | May 2013 | A1 |
20130111236 | Ananthakrishnan et al. | May 2013 | A1 |
20130246825 | Shannon | Sep 2013 | A1 |
Number | Date | Country |
---|---|---|
101351759 | Jan 2009 | CN |
101403944 | Apr 2009 | CN |
101010655 | May 2010 | CN |
1 282 030 | May 2003 | EP |
10-2006-012846 | Dec 2006 | KR |
I342498 | May 2011 | TW |
I344793 | Jul 2011 | TW |
Entry |
---|
U.S. Appl. No. 12/889,121, “Providing Per Core Voltage and Frequency Control,” filed Sep. 23, 2010, by Pakaj Kumar. |
SPEC-Power and Performance, Design Overview V1.10, Standard Performance Information Corp., Oct. 21, 2008, 6 pages. |
U.S. Appl. No. 13/070,700, “Obtaining Power Profile Information With Low Overhead,” filed Mar. 24, 2011, by Robert Knight. |
Anoop Iyer, et al., “Power and Performance Evaluation of Globally Asynchronous Locally Synchronous Processors,” 2002, pp. 1-11. |
Greg Semeraro, et al., “Hiding Synchronization Delays in a GALS Processor Microarchitecture,” 2004, pp. 1-13. |
Joan-Manuel Parcerisa, et al., “Efficient Interconnects for Clustered Microarchitectures,” 2002, pp. 1-10. |
Grigorios Magklis, et al., “Profile-Based Dynamic Voltage and Frequency Scalling for a Multiple Clock Domain Microprocessor,” 2003, pp. 1-12. |
Greg Semeraro, et al., “Dynamic Frequency and Voltage Control for a Multiple Clock Domain Architecture,” 2002, pp. 1-12. |
Greg Semeraro, “Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling,” 2002, pp. 29-40. |
Diana Marculescu, “Application Adaptive Energy Efficient Clustered Architectures,” 2004, pp. 344-349. |
L. Benini, et al., “System-Level Dynamic Power Management,” 1999, pp. 23-31. |
Ravindra Jejurikar, et al., “Leakage Aware Dynamic Voltage Scaling for Real-Time Embedded Systems,” 2004, pp. 275-280. |
Ravindra Jejurikar, et al., “Dynamic Slack Reclamation With Procrastination Scheduling in Real-Time Embedded Systems,” 2005, pp. 13-17. |
R. Todling, et al., “Some Strategies for Kalman Filtering and Smoothing,” 1996, pp. 1-21. |
R.E. Kalman, “A New Approach to Linear Filtering and Prediction Problems,” 1960, pp. 1-12. |
Intel Technology Journal, “Power and Thermal Management in the Intel Core Duo Processor,” May 15, 2006, pp. 109-122. |
David L. Hill, et al., “The Uncore: A Modular Approach to Feeding the High-Performance Cores,” Intel Technology Journal, 2010, vol. 14, Issue 3, pp. 30-49. |
Intel Developer Forum, IDF2010, Opher Kahn, et al., “Intel Next Generation Microarchitecture Codename Sandy Bridge: New Processor Innovations,” Sep. 13, 2010, 58 pages. |
U.S. Patent and Trademark Office, Office Action mailed Aug. 18, 2014, in U.S. Appl. No. 13/285,465. |
U.S. Patent and Trademark Office, Final Office Action mailed May 14, 2014, with Request for Continued Examination filed Aug. 13, 2014, in U.S. Appl. No. 13/247,580. |
U.S. Patent and Trademark Office, Office Action mailed Jun. 6, 2014, with Reply filed Sep. 4, 2014, in U.S. Appl. No. 13/282,947. |
U.S. Patent and Trademark Office, Office Action mailed May 16, 2014, with Reply filed Aug. 12, 2014, in U.S. Appl. No. 13/285,414. |
Number | Date | Country | |
---|---|---|---|
20130080803 A1 | Mar 2013 | US |