Aspects of the present disclosure relate generally to temperature monitoring, and more particularly, to temperature monitoring of circuits.
Due to increases in chip integration and operating frequencies, power densities in chips have dramatically increased, resulting in higher chip temperatures. As a result, temperature monitoring is playing an increasingly important role in protecting chips from damage due to overheating. In this regard, temperature sensors may be integrated on a chip to monitor temperature at various locations on the chip. Temperature readings from the temperature sensors may be input to a temperature manager that manages blocks of circuitry (e.g., central processing units (CPUs)) on the chip based on the temperature readings to prevent overheating.
The following presents a simplified summary of one or more embodiments in order to provide a basic understanding of such embodiments. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all embodiments nor delineate the scope of any or all embodiments. Its sole purpose is to present some concepts of one or more embodiments in a simplified form as a prelude to the more detailed description that is presented later.
A first aspect relates to a system. The system comprises a plurality of temperature sensors on a chip, and a multiplexer having a plurality of inputs and an output, wherein each of the inputs is coupled to a respective one of the temperature sensors. The system also comprises an analog-to-digital converter (ADC) coupled to the output of the multiplexer, wherein the ADC is configured to convert an output signal from the output of the multiplexer into a digital signal. The system further comprises a temperature manager configured to instruct the multiplexer to select one or more of the temperature sensors, to receive the digital signal from the ADC, and to compute a temperature based on the digital signal. The multiplexer is configured to generate the output signal based on one or more temperature readings from the selected one or more of the temperature sensors.
A second aspect relates to a method for temperature monitoring. The method comprises receiving temperature readings from a plurality of temperature sensors on a chip, determining an average or a sum of the temperature readings, and computing a temperature at a location on the chip based on the average or sum of the temperature readings.
A third aspect relates to a method for performing a search. The method comprises receiving sensor readings from a plurality of sensors on a chip, dividing a region of the chip into a first region and a second region, determining a first value for the first region based on a first subset of the sensor readings, and determining a second value for the second region based on a second subset of the sensor readings. The method also comprises comparing the first value with the second value, and narrowing the search to one of the first and second regions corresponding to a highest one of the first and second values.
To the accomplishment of the foregoing and related ends, the one or more embodiments include the features hereinafter fully described and particularly pointed out in the claims. The following description and the annexed drawings set forth in detail certain illustrative aspects of the one or more embodiments. These aspects are indicative, however, of but a few of the various ways in which the principles of various embodiments may be employed and the described embodiments are intended to include all such aspects and their equivalents.
The detailed description set forth below, in connection with the appended drawings, is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form in order to avoid obscuring such concepts.
Temperature sensors may be integrated on a chip to monitor temperature at various locations on the chip. In this regard,
Each temperature sensor 111-114 may be implemented with a temperature-sensitive circuit configured to output a voltage or current that is a function of temperature at the respective sensor location. For example, the temperature-sensitive circuit may output a voltage or current that is approximately a linear function of absolute temperature over a temperature range. Since the output voltage or current of the temperature-sensitive circuit is a function of temperature, the output voltage or current indicates the temperature at the respective sensor location, and therefore provides a temperature reading (temperature measurement) at the respective sensor location. The temperature-sensitive circuit may be implemented with a bandgap-temperature circuit, one or more diode-connected transistors, a temperature-sensitive resistor, etc.
In operation, the temperature manager 120 receives temperature readings (temperature measurements) from the temperature sensors 111-114, where each temperature reading indicates the temperature sensed at the location of the respective sensor. The temperature reading from each temperature sensor 111-114 may be digitized using a respective analog-to-digital converter (ADC) (not shown) before going to the temperature manager 120, allowing the temperature manager 120 to process the temperature reading in the digital domain. For example, if the temperature reading is in the form of a voltage or current that is a function of temperature, the ADC may convert the voltage or current into a temperature reading in the digital domain.
When the temperature manager 120 receives a temperature reading from one of the temperature sensors 111-114, the temperature manager 120 may compare the temperature reading with a temperature threshold. The temperature threshold may correspond to an upper temperature boundary for safe operation of the chip 100. If a temperature reading exceeds the temperature threshold, then the temperature manager 120 may take steps to reduce the temperature to prevent overheating. For example, if the region 110 includes a processor (e.g., CPU) on the chip 100, then the temperature manager 120 may reduce the temperature by reducing the operating frequency of the processor and/or reducing the supply voltage of the processor. Reducing the operating frequency and/or supply voltage of the processor reduces the temperature by reducing the dynamic power dissipation of the processor. In extreme cases where the temperature reading exceeds the temperature threshold by a large amount, the temperature manager 120 may shut down the processor to prevent damage to the chip 100.
The temperature manager 120 in
The multiplexer 220 is configured to selectively couple one or more of the temperature sensors 211-214 to the ADC 225 at a time based on a multiplexer control signal (denoted “MUX control” in
In another example, the temperature manager 230 may command the multiplexer 220 to couple two or more of the temperature sensors 211-214 to the ADC 225 at the same time. In this example, the multiplexer 220 may be configured to average the temperature readings (temperature measurements) from the two or more temperature sensors, and output the average temperature reading to the ADC 225. The ADC 225 may then convert the average temperature reading into digital form, and output the average temperature reading in digital form to the temperature manager 230. This allows the temperature manager 230 to receive a single average temperature reading from the two or more temperature sensors rather than individual temperature readings.
Alternatively, the multiplexer 220 may be configured to sum the temperature readings (temperature measurements) from the two or more temperature sensors, and output the sum of the temperature readings to the ADC 225. The ADC 225 may then convert the sum of the temperature readings into digital form, and output the sum of the temperature readings in digital form to the temperature manager 230. In this aspect, the temperature manager 230 may convert the sum of the temperature readings into an average temperature reading by dividing the sum of the temperature readings by the number of temperature sensors contributing to the sum.
Exemplary implementations of the multiplexer 220 will now be discussed below with reference to
The switch decoder 355 is configured to receive the multiplexer control signal (denoted “MUX control” in
In the example shown in
The multiplexer 320 allows the temperature manager 230 to selectively couple any one of the temperature sensors 211-214 to the ADC 225 at a time. For example, the temperature manager 230 may command the multiplexer 320 to couple a particular temperature sensor to the ADC 225 by commanding the multiplexer 320 to close the switch in the respective segment and to open the switches in the remaining segments. In this case, the output voltage (denoted “Vout” in
The multiplexer 320 also allows the temperature manager 230 to selectively couple two or more of the temperature sensors 211-214 to the ADC 225 at the same time. For example, the temperature manager 230 may command the multiplexer 320 to couple all of the temperature sensors to the ADC 225 by commanding the multiplexer 320 to close the switches in all of the segments. In this case, the output voltage (denoted “Vout” in
In general, the multiplexer 320 allows the temperature manager 230 to select any subset of the temperature sensors 211-214. For example, the temperature manager 230 may command the multiplexer 320 to couple a subset of the temperature sensors to the ADC 225 by commanding the multiplexer 320 to close the switches in the respective segments (i.e., the segments corresponding to the selected subset of temperature sensors) and to open the switch(es) in the remaining segment(s). In this case, the multiplexer provides an average temperature reading for the temperature sensors in the selected subset.
The multiplexer 420 allows the temperature manager 230 to selectively couple any one of the temperature sensors 211-214 to the ADC 225 at a time. For example, the temperature manager 230 may command the multiplexer 420 to couple a particular temperature sensor to the ADC 225 by commanding the multiplexer 420 to close the switch in the respective segment and to open the switches in the remaining segments. In this case, the output voltage (denoted “Vout”) is given by:
where Vs is the voltage of the selected temperature sensor. As shown in equation 1, the output voltage is proportional to the voltage of the selected temperature sensor, and therefore provides a temperature reading for the selected temperature sensor.
The multiplexer 420 also allows the temperature manager 230 to selectively couple two or more of the temperature sensors 211-214 to the ADC 225 at the same time. For example, the temperature manager 230 may command the multiplexer 420 to couple all of the temperature sensors to the ADC 225 by commanding the multiplexer 420 to close the switches in all of the segments. In this case, the output voltage (denoted “Vout”) is given by:
where Vs1-Vs4 are the voltages of temperature sensors 211-214, respectively. As shown in equation 2, the output voltage is proportional to the sum of the voltages of the temperature sensors, and therefore provides a sum of the temperature readings from the temperature sensors 211-214.
In general, the multiplexer 420 allows the temperature manager 230 to select any subset of the temperature sensors 211-214. For example, the temperature manager 230 may command the multiplexer 420 to couple a subset of the temperature sensors to the ADC 225 by commanding the multiplexer 420 to close the switches in the respective segments and to open the switch(es) in the remaining segment(s). In this case, the multiplexer provides a sum of the temperature readings from the temperature sensors in the selected subset.
The multiplexer 520 includes multiple switches 541-544 and a switch decoder 555, in which each switch is coupled between a respective one of the temperature sensors 211-214 (not shown in
The multiplexer 520 allows the temperature manager 230 to selectively couple any one of the temperature sensors 211-214 to the ADC 225 at a time. For example, the temperature manager 230 may command the multiplexer 520 to couple a particular temperature sensor to the ADC 225 by commanding the multiplexer 520 to close the switch corresponding to the selected temperature sensor and to open the remaining switches. In this case, the multiplexer 520 outputs the current from the selected temperature sensor. The ADC 225 may convert the sensor current into a digital temperature reading. For example, the ADC 225 may convert the current into a voltage by passing the current through a resistor (not shown), and then convert the voltage into the digital temperature reading.
The multiplexer 520 also allows the temperature manager 230 to selectively couple two or more of the temperature sensors 211-214 to the ADC 225 at the same time. For example, the temperature manager 230 may command the multiplexer 520 to couple all of the temperature sensors to the ADC 225 by commanding the multiplexer 520 to close all of the switches. In this case, the temperature sensors are shorted together, and the output current of the multiplexer 520 (denoted “Iout”) is approximately equal to the sum of the currents of the temperature sensors. Thus, in this case, the output of the multiplexer 520 provides a sum of the temperature readings from the temperature sensors 211-214.
In general, the multiplexer 520 allows the temperature manager 230 to select any subset of the temperature sensors 211-214. For example, the temperature manager 230 may command the multiplexer 520 to couple a subset of the temperature sensors to the ADC 225 by commanding the multiplexer 520 to close the respective switches and to open the remaining switch(es). In this case, the temperature sensors in the selected subset are shorted together, and the multiplexer 520 provides a sum of the temperature readings from the temperature sensors in the selected subset.
At step 610, the temperature manager 230 compares an average of the temperature readings from two or more temperature sensors 211-214 with a first temperature threshold. The first temperature threshold is below a second temperature threshold used to trigger temperature mitigation, as discussed further below. For example, the first temperature threshold may be 10° C. or more below the second temperature threshold. To obtain the average temperature reading, the temperature manager 230 may command the multiplexer 220 to couple the two or more temperature sensors to the temperature manager 230, as discussed above. The multiplexer 220 may be implemented using any of the multiplexers 320, 420 and 520 shown in
At step 620, the temperature manager 230 determines whether the average temperature reading exceeds the first temperature threshold. If the average temperature reading does not exceed the first temperature threshold, then the temperature manager 230 repeats steps 610 and 620 (e.g., after a predetermined time interval). If the average temperature reading exceeds the first temperature threshold, then the temperature manager 230 proceeds to step 630.
At step 630, the temperature manager 230 compares a temperature reading from one of the temperature sensors 211-214 with the second temperature threshold. The second temperature threshold may correspond to an upper temperature boundary for safe operation of the chip 200.
At step 640, the temperature manager 230 determines whether the temperature reading exceeds the second threshold. If the temperature reading does not exceed the second temperature threshold, then the temperature manager 120 repeats steps 630 and 640. For example, the temperature manager 230 may repeat steps 630 and 640 using a temperature reading from another one of the temperature sensors 211-214. In this example, the temperature manager 230 may cycle through the temperature sensors 211-214 as steps 630 and 640 are repeated. If the temperature reading in step 640 exceeds the second temperature threshold, then the temperature manager 230 proceeds to step 650.
At step 650, the temperature manager 230 performs temperature mitigation. For example, if the region 210 includes a processor (e.g., CPU) on the chip 200, then the temperature manager 230 may perform temperature mitigation by reducing the operating frequency of the processor and/or reducing the supply voltage of the processor.
Thus, in the example in
As discussed above, at step 610, the temperature manager 230 compares the average of the temperature readings from two or more temperature sensors 211-214 with the first temperature threshold. For the embodiments in which the multiplexer 220 provides the sum of the temperature readings from the two or more temperature sensors, the temperature manager 230 may determine the average temperature reading by dividing the sum of the temperature readings by the number of temperature sensors contributing to the sum. Alternatively, the temperature manager 230 may compare the sum of the temperature readings with a product of the first temperature threshold and the number of temperature sensors contributing to the sum. This is equivalent to comparing the average temperature reading with the first temperature threshold. Thus, in the present disclosure, it is to be understood that comparing the average temperature reading with the first temperature threshold also covers comparing the sum of the temperature readings with the product of the first temperature threshold and the number of temperature sensors contributing to the sum.
In certain aspects, the temperature manager 230 may estimate the temperature at a location within the region 210 based on an average of the temperature readings from the temperature sensors. The location within the region 210 may correspond to an average of the locations of the temperature sensors 211-214. In this regard,
T
Est
=β·T
Avg (Eq. 3)
where TEst is the estimated temperature at location 710, TAvg is the average of the temperature readings, and β is a coefficient. The value of coefficient β may be determined by running thermal simulations of the chip 200 or testing a physical chip. For example, the temperatures at the sensor locations and the temperature at location 710 may be determined by running a thermal simulation of the chip or testing a physical chip. In this example, the value of coefficient β may be computed by averaging the temperatures at the sensor locations, and dividing the temperature at location 710 by the average temperature. It is to be appreciated that the value of coefficient β may also be determined using other techniques.
In another example, the temperature manager 230 may estimate the temperature at location 710 based on the sum of the temperature readings from the temperature sensors as follows:
T
Est=β·(Ts1+Ts2+Ts3+Ts4) (Eq. 4)
where TEst is the estimated temperature at location 710, Ts1 to Ts4 are the temperature readings from temperature sensors 211-214, respectively, and β′ is a coefficient. Coefficient β′ may be approximately equal to coefficient β divided by the number of temperature sensors contributing to the sum (e.g., four in this example).
In general, the temperature manager 230 may estimate the temperature at different locations using different subsets of the temperature sensors 211-214. In this regard,
Thus, the temperature manager 230 is able to estimate the temperature at different locations by having the multiplexer 220 select different subsets of the temperature sensors 211-214. Although four temperature sensors 211-214 are shown in the examples in
In certain aspects, two or more temperature sensors may be placed on the chip based on an estimated hotspot location on the chip. For example, the two or more temperature sensors may be placed on the chip such that the average of the locations of the two or more temperature sensors (centroid of the two or more temperature sensors) is located at or near the hotspot location. This allows the temperature manager 230 to estimate the temperature at the hotspot location. The hotspot location may be estimated by running thermal simulations of the chip and looking for a location on the chip with a peak temperature.
An advantage of this embodiment is that it allows the temperature manager 230 to estimate the temperature at the hotspot location without having to physically place a temperature sensor at the hotspot location which may be difficult. This is because the hotspot may be located in an area of the chip densely populated with transistors and interconnecting metal wires.
In certain aspects, the hotspot location may change for different operating conditions of the chip (e.g., depending on which processors on the chip are active at a given time). The hotspot locations for the different operating conditions may be determined by running thermal simulations of the chip for each operating condition and looking for a location on the chip with a peak temperature for each operating condition. In this example, the temperature sensors may be placed on the chip such that different subsets of the temperature sensors correspond to different hotspot locations. More particularly, each subset of the temperature sensors may be placed on the chip such that the average of the locations of the temperature sensors in the subset (centroid of the temperature sensors in the subset) is located at or near the respective hotspot location. It is to be appreciated that two or more subsets may have one or more temperature sensors in common.
In certain aspects, the temperature manager 230 may estimate the temperature at a location on the chip based on temperature readings from all of the temperature sensors or a subset of the temperature sensors, as discussed above, and compare the estimated temperature with a temperature threshold. The location may correspond to an estimated hotspot location on the chip 200, and the temperature threshold may correspond to an upper temperature boundary for safe operation of the chip. If the estimated temperature exceeds the temperature threshold, then the temperature manager 230 may perform temperature mitigation, as discussed above.
In certain aspects, the temperature manager 230 may estimate the temperature at a location on the chip based on temperature readings from all of the temperature sensors or a subset of the temperature sensors, as discussed above, and compare the estimated temperature with the second temperature threshold at step 630 in
In certain aspects, the temperature manager 230 may estimate the temperature at a location on the chip based on a weighted sum of temperature readings from two or more temperature sensors. The location may be within the region 210 or outside the region 210. For example, the temperature manager 230 may estimate the temperature at a location on the chip as follows:
T
Est=α1Ts1+α2Ts2+α3Ts3+α4Ts4 (Eq. 5)
where TESL is the estimated temperature at the location, Ts1 to Ts4 are the temperature readings from temperature sensors 211-214, respectively, and α1 to α4 are weights assigned to temperature sensors 211-214, respectively. As shown in equation 5, the temperature readings from the temperature sensors may be weighted differently by assigning different weights to the temperature sensors 211-214. Thus, the temperature reading from a temperature sensor assigned a higher weight is weighted more than a temperature reading from a temperature sensor assigned a lower weight. Methods for determining the weights will be discussed further below.
In these aspects, the multiplexer 220 may be configured to provide a weighted sum of temperature readings from two or more selected temperature sensors. In this regard,
The multiplexer 820 is similar to the multiplexer 420 shown in
where Vs1-Vs4 are the voltages of temperature sensors 211-214, respectively. As shown in equation 6, the output voltage is proportional to a weighted sum of the voltages of the temperature sensors, in which the weight of each sensor voltage is inversely proportional to the resistance of the respective variable resistor. Thus, the resistance of each variable resistor R1-R4 controls the weight of the respective temperature sensor 211-214.
Thus, the temperature manager 230 may assign different weights to the temperature sensors by controlling the resistances of the variable resistors R1-R4 accordingly. In this regard, the weight decoder 855 may control the resistance of each variable resistor R1-R4 based on a weight control signal received from the temperature manager 230. For example, the weight control signal may specify a weight for each temperature sensor. In this example, the weight decoder 855 may set the resistance of each variable resistor R1-R4 according to the specified weight for the respective temperature sensor. Thus, the multiplexer 820 not only allows the temperature manager 230 to select which of the temperature sensors 211-214 are coupled to the ADC 225, but also to control the weight of each temperature sensor.
Each variable resistor R1-R4 may be implemented using a programmable resistor network. In this regard,
In operation, a resistor in the network 912 is selected by closing the respective switch, and a resistor in the network 912 is unselected by opening the respective switch. The resistance of the variable resistor 910 is determined by the resistances of the selected resistors 915-1 to 915-n. Thus, the weight decoder 855 (not shown in
In this example, the variable resistor 910 may be set to any one of a plurality of different resistances corresponding to a plurality of different weights, in which each one of the plurality of different resistances corresponds to a different selection of resistors in the network 912. The greater the number of resistors 915-1 to 915-n in the network, the greater the number of available resistances. Thus, increasing the number of resistors 915-1 to 915-n in the network 912 allows the temperature manager 230 to adjust the resistance of the variable resistor (and hence adjust the weight of the corresponding temperature sensor) with finer granularity.
In this example, the variable resistor 910 may also control whether the respective temperature sensor is selected. In this regard, if the respective temperature sensor is not selected, then the switch decoder 355 may open all of the switches 920-1 to 920-n in the network 912, thereby decoupling the respective temperature sensor from the ADC 225. Thus, in this example, the switches 341-344 shown in
The multiplexer 1020 is similar to the multiplexer 520 shown in
Each switch 541-544 controls whether the scaled current for the respective temperature sensor is coupled to the ADC 225. More particularly, the switch decoder 555 closes the switches 541-544 corresponding to the selected temperature sensors. The output current (denoted “Iout”) is the sum of the scaled currents of the selected temperature sensors.
In this example, the temperature manager 230 may assign different weights to the temperature sensors by controlling the scaling factors of the current scalers 1031-1034. For example, the temperature manager 230 may assign a higher weight to a temperature sensor by increasing the scaling factor of the respective current scaler, and assign a lower weight to a temperature sensor by decreasing the scaling factor of the respective current scaler. In this regard, the weight decoder 1055 may control the scaling factor of each current scaler 1031-1034 based on a weight control signal received from the temperature manager 230. For example, the weight control signal may specify a weight for each temperature sensor. In this example, the weight decoder 1055 may set the scaling factor of each current scaler 1031-1034 according to the specified weight for the respective temperature sensor. Thus, the multiplexer 1020 not only allows the temperature manager 230 to select which of the temperature sensors 211-214 are coupled to the ADC 225, but also to control the weight of each temperature sensor by controlling the scaling factor of the respective current scaler.
The current scaler 1110 also includes an input transistor 1125 having a drain coupled to the respective temperature sensor (not shown in
In operation, a scaling transistor is selected by closing the respective switch, and a scaling transistor is unselected by opening the respective switch. The overall scaling factor of the current scaler 1110 is determined by a sum of the scaling factors of the selected scaling transistors. Thus, the weight decoder 1055 (not shown in
In this example, the current scaler 1110 may be set to any one of a plurality of different scaling factors corresponding to a plurality of different weights, in which each one of the plurality of different scaling factors corresponds to a different selection of scaling transistors. The greater the number of scaling transistors in the current scaler, the greater the number of available scaling factors. Thus, increasing the number of scaling transistors in the current scaler allows the temperature manager 230 to adjust the scaling factor of the current scaler (and hence adjust the weight of the corresponding temperature sensor) with finer granularity.
In this example, the current scaler 1110 may also control whether the respective temperature sensor is selected. In this regard, if the respective temperature sensor is not selected, then the switch decoder 555 may open all of the switches 1130-1 to 1130-m in the current scaler 1110. Thus, in this example, the switches 541-544 shown in
In the example in
As discussed above, the temperature manager 230 may estimate the temperature at a location on the chip based on a weighted sum of temperature readings from two or more temperature sensors. For example, the temperature manager 230 may estimate the temperature at a location on the chip according to equation 5 discussed above. The weights α1 to α4 of the temperature sensors for different locations may be determined using various techniques, examples of which are provided below.
For example, the weights assigned to the temperature sensors for a location on the chip may be determined based in part on distances of the temperature sensors from the location. In this example, a temperature sensor located closer to the location may be assigned a larger weight than a temperature sensor located farther from the location. In another example, the weights assigned to the temperature sensors for a location on the chip may be determined based in part on a temperature gradient map of the region 210. The temperature gradient map may be generated by running thermal simulations of the chip 200 or testing a physical chip. In this example, the gradient map may be used to estimate a change in temperature between the location and each of the temperature sensors. A temperature sensor with a smaller change in temperature may be assigned a higher weight than a temperature sensor with a larger change in temperature. Factors that may be considered in determining the weights assigned to the temperature sensors for a location on the chip may include shapes of the circuit blocks in the region 210, layout of the circuit blocks in the region 210, and/or placement of the temperature sensors. Other factors that may be considered are discussed further below.
The determined weights for a location on the chip may be refined based on thermal simulations of the chip 200 or tests on a physical chip. For example, a thermal simulation or test may be used to determine the temperatures at the locations of the temperature sensors and the location on the chip. The temperatures at the locations of the temperature sensors may then be input to equation 5 using the determined weights to compute an estimate of the temperature at the location on the chip. The estimated temperature at the location may then be compared with the temperature at the location determined by the simulation or test.
If the difference (error) between the estimated temperature at the location and the determined temperature at the location is equal to or below an error threshold, then the determined weights may be used for the location. If the difference (error) is above the error threshold, then one or more of the weights may be adjusted. After the weight adjustment, an estimate of the temperature at the location may be recomputed using the adjusted weights. The estimated temperature at the location may then be compared with the temperature at the location determined by the simulation or test. If the difference (error) between the estimated temperature at the location and the determined temperature at the location is equal to or below the error threshold, then the adjusted weights may be used for the location. If the difference (error) is above the error threshold, then one or more of the weights may be adjusted again, and the error may be recomputed to determine whether the error is reduced to a value equal to or below the error threshold. If the error is reduced to a value equal to or below the error threshold, then the readjusted weights may be used for the location. If not, then the above process may be repeated until the error is reduced to a value equal to or below the error threshold.
It is to be appreciated that the temperature at a location on the chip may be estimated using a subset of the temperature sensors instead of all of the temperature sensors. In this case, the weights assigned to the temperature sensors in the subset may be determined using the above techniques, in which the estimated temperature at the location on the chip is computed using the weighted sum of the temperatures at the locations of the temperature sensors in the subset.
The above techniques may be used to determine weights for each one of a plurality of different locations on the chip. For example, the different locations may correspond to different hotspot locations for different operating conditions of the chip (e.g., different use cases of the chip). The hotspot locations for the different operating conditions may be determined by running thermal simulations of the chip for each operating condition and looking for a location on the chip with a peak temperature for each operating condition.
The weights of the temperature sensors for a particular hotspot location may be determined using the above techniques under operating conditions corresponding to the hotspot location. The operating conditions may correspond to high activity (e.g., high operating frequency) of one or more processors (e.g., CPU, GPU, etc.) in the region 210. In this example, the weights of the temperature sensors for the hotspot location may be refined using the above techniques based on thermal simulations of the chip 200 under the operating conditions corresponding to the hotspot location. In this case, the weights for the temperature sensors may correspond to a particular location on the chip (i.e., the hotspot location) and certain operating conditions (i.e., the operating conditions corresponding to the hotspot location).
The weights for each of the different locations on the chip may be stored in a lookup table in a memory on the chip. The memory may be an internal memory in the temperature manager 230 or a memory coupled to the temperature manager 230. In this example, the temperature manager 230 may estimate the temperature at a particular location on the chip by retrieving (looking up) the corresponding weights from the lookup table in the memory, and sending a weight control signal to the multiplexer 220 according to the retrieved weights. The multiplexer 220 may then set the resistances of the variable resistors R1-R4 or set the current scaling factors of the current scalers 1031-1034 according to the received weight control signal, as discussed above.
In certain aspects, the temperature manager 230 may estimate the temperature at a location on the chip based on a weighted sum of the temperature readings from all of the temperature sensors or a subset of the temperature sensors, as discussed above, and compare the estimated temperature with a temperature threshold. The temperature threshold may correspond to an upper temperature boundary for safe operation of the chip. If the estimated temperature exceeds the temperature threshold, then the temperature manager 230 may perform temperature mitigation, as discussed above.
In certain aspects, the temperature manager 230 may estimate the temperature at a location on the chip based on a weighted sum of temperature readings, as discussed above, and compare the estimated temperature with the second temperature threshold at step 630 in
In certain aspects, the temperature management system may include multiple multiplexers. For example, the temperature management system may include multiple instances of multiplexer 220 and ADC 225 to monitor temperature at different locations. In this example, each of the multiplexers may be configured to provide a weighted sum of the temperature readings from all of the temperature sensors or a subset of the temperature sensors, and each of the ADCs may be configured to convert the weighted sum from a respective one of the multiplexers into the digital domain for processing by the temperature manager 230. In this example, the temperature manager 230 may set the weights of the multiplexers differently so that the weighted sum from each multiplexer corresponds to a different location. This allows the temperature manager 230 to monitor the temperature at the different locations using the multiple multiplexers instead of using a single multiplexer that is time multiplexed between the different locations (which may require changing the weights of the single multiplexer for each location).
A binary search algorithm may be performed to locate a hotspot on a chip according to certain aspects. In this regard, an exemplary binary search is described below with reference to
An exemplary binary search will be now described with reference to the exemplary temperature sensor layout shown in
In a first step, the region 1210 is divided (partitioned) into two smaller triangular regions 1310 and 1320, as shown in
The temperature manager 230 may compute the temperature for region 1320 in a similar manner using temperature readings from temperature sensors 1211, 1219, 1215, 1216, 1217 and 1218.
After computing the temperatures for regions 1310 and 1320, the temperature manager 230 may compare the temperature for region 1310 with the temperature reading for region 1320 to determine which one is higher. The temperature manager 230 may then narrow the search to the region having the highest temperature. In the example in
In a subsequent step, the temperature manager 230 divides (partitions) the region with the highest temperature in the previous step into two smaller regions, as shown in
After computing the temperatures for regions 1330 and 1335, the temperature manager 230 may compare the temperature for region 1330 with the temperature for region 1335 to determine which one is higher. The temperature manager 230 may then narrow the search to the region having the highest temperature. In the example in
In a subsequent step, the temperature manager 230 divides (partitions) the region with the highest temperature in the previous step into two smaller regions, as shown in
After computing the temperatures for regions 1340 and 1345, the temperature manager 230 may compare the temperature for region 1340 with the temperature for region 1345 to determine which one is higher. The temperature manager 230 may then narrow the search to the region having the highest temperature. In the example in
Thus, in each successive step, the binary search is narrowed to a smaller region. For example, in each successive step, the binary search may be narrowed to a region that is approximately half the size of the region in the previous step. In this example, the search area is reduced in half in each successive step. The binary search may continue until the search is narrowed to a region of a certain size. At this point, the temperature manager 230 may estimate the hotspot temperature as the temperature for that region. Alternatively, the temperature manager 230 may individually check the temperature reading from each of the temperature sensors corresponding to the region, and use the highest temperature reading as the estimate of the hotspot temperature. The temperature sensors corresponding to the region may include temperature sensors lying along the perimeter (boundary) of the region and/or within the region. In another example, the temperature manager 230 may estimate the temperature at two or more different locations within the region using any of the techniques discussed above. In this example, the temperature manager 230 may use the highest estimated temperature as the estimate of the hotspot temperature.
After estimating the hotspot temperature using the binary search, the temperature manager 230 may compare the estimated hotspot temperature with a temperature threshold (e.g., the second temperature threshold at step 630 in
The binary search allows the temperature manager 230 to quickly locate a hotspot on the chip and accurately estimate the temperature of the hotspot. This reduces the negative impact of temperature management on performance (e.g., processor performance) by reducing the amount of unnecessary temperature mitigations (which reduce performance), as discussed further below.
In a conventional temperature management system, temperature is only determined at the temperature sensor locations, which can be located away from a hotspot. To account for this, the convention temperature manager adds a temperature margin to a temperature reading to ensure that the hotspot temperature does not exceed the temperature threshold. The temperature margin is typically based on the worst case difference between the temperature reading and the hotspot temperature. In many cases, the actual difference between the temperature reading and the hotspot temperature is less than the temperature margin. Thus, most of the time, the temperature margin causes the temperature manager to initiate temperature mitigation when the actual hotspot temperature is still below the temperature threshold, resulting in unnecessary decreases in performance most of the time. By allowing the temperature manager 230 to more accurately estimate the hotspot temperature, the binary search allows the temperature manager 230 to reduce the temperature margin, which reduces the amount of unnecessary temperature mitigations, and therefore reduces the negative impact on performance.
The exemplary search algorithm is not limited to temperature sensors. For example, the search algorithm may be applied to current sensors used to monitor current draws from a power distribution network. In this regard,
In this example, the blocks 1411-1414 draw currents I1-I4 from the power distribution network 1400. The amount of current drawn by a block may depend, for example, on whether the block is in an active state or an idle state, the operating frequency of the block, etc. For instance, the current drawn by a block may increase as the operating frequency of the block increases.
The currents I1-I4 drawn by the blocks 1411-1414 may be monitored using current sensors integrated on the chip. In this regard,
The amplifier 1525 has a first input coupled to the drain of the power transistor 1505, a second input coupled to the drain of the first transistor 1515, and an output coupled to the gate of the second transistor 1520. In operation, the amplifier 1525 senses the difference (error) between the drain voltages of the power transistor 1505 and the first transistor 1515, and adjusts the gate voltage of the second transistor 1520 in a direction that reduces the different (error) between the drain voltages of the power transistor 1505 and the first transistor 1515. In other words, the amplifier 1525 is coupled in a feedback loop that forces the drain voltage of the first transistor 1515 to be approximately equal to the drain voltage of the power transistor 1505. This helps ensure that the current (i.e., sensor current Is) flowing through the first and second transistors 1515 and 1520 tracks the current flowing through the power transistor 1505, and therefore provides a measurement of the current flowing through the power transistor 1505 into block 1508 (i.e., current drawn by block 1508).
The sensor current Is may be a scaled-down version of the current flowing through the power transistor 1505 into block 1508. In this example, the ratio of the sensor current to the current flowing through the power transistor 1505 may be set by the ratio of the channel width of the first transistor 1515 to the channel width of the power transistor 1505. In certain aspects, the ratio may be 1 to 100 or greater so that the sensor current is much lower than the current flowing through the power transistor 1505. Making the sensor current Is low reduces the power consumption of the current sensor.
The multiplexer 1620 is configured to selectively couple one or more of the current sensors 1611-1614 to the ADC 1630 at a time based on a multiplexer control signal (denoted “MUX control” in
In this example, the current manager 1640 may compare a current reading from one of the current sensors with a current threshold, and perform current mitigation if the current reading exceeds the current threshold. The current threshold may correspond to a maximum current that can be safely supplied to the corresponding block. If the current reading exceeds the current threshold, then the current manager 1640 may reduce the current drawn by the corresponding block. For example, the current manager 1640 may reduce the current by reducing the operating frequency of the block. The current manager 1640 may perform the above steps for each one of the current sensors.
In another example, the current manager 1640 may command the multiplexer 1620 to couple two or more of the temperature sensors 1611-1614 to the ADC 1630 at the same time. In this example, the multiplexer 1620 may be configured to sum the current readings (current measurements) from the two or more current sensors, and output the sum of the current readings to the ADC 1630. The ADC 1630 may then convert the sum of the current readings into digital form, and output the sum of the current readings in digital form to the current manager 1640. In this example, the multiplexer 1620 may be implemented using the multiplexer 520 shown in
In this example, the current manager 1640 may compare the sum of the current readings from the current sensors with a current threshold, and perform current mitigation if the current threshold exceeds the current threshold. The current threshold may correspond to a maximum total current that can be supplied to the blocks. If the sum of the current readings exceeds the current threshold, then the current manager 1640 may reduce the total current drawn by the blocks. For example, the current manager 1640 may reduce the total current by reducing the operating frequencies of one or more of the blocks, and/or shutting down one or more of the blocks.
The exemplary binary search algorithm discussed above may be performed using the current sensors 1611-1614 to find a largest current draw on the power distribution network 1400. In a first step, a region 1610 is divided (partitioned) into two smaller regions. The region 1610 may have a boundary defined by the current sensors 1611-1614, as shown in
After computing the currents for regions 1650 and 1655, the current manager 1640 may compare the current for region 1650 with the current for region 1655 to determine which one is higher. The current manager 1640 may then continue the search in the region having the highest current.
For example, if the current monitoring system includes additional current sensors (not shown in
In this example, the current manager 1640 may compare the largest current draw with a current threshold, and perform current mitigation if the largest current draw exceeds the current threshold. If the largest current draw exceeds the current threshold, then the current manager 1640 may reduce the largest current draw by, for example, reducing the operating frequency of a block contributing to the largest current draw.
The adjustable clock source 1740 is configured to generate a clock signal for the block 1710 (e.g., processor), and to adjust the frequency of the clock signal (denoted “Clk”) under the control of the power manager 1720. The clock signal is output to the block 1710, which may use the clock signal for switching (toggling) transistors in the block. In this example, the frequency of the clock signal may correspond to an operating frequency of the block. Thus, the power manager 1720 can adjust (scale) the operating frequency of the block by adjusting the frequency of the clock signal output by the clock source 1740.
The adjustable power source 1750 is configured to provide an adjustable supply voltage (denoted “Vdd”) to the block 1710 (e.g., via the power distribution network 1400), and to adjust the supply voltage Vdd under the control of the power manager 1720. The power source 1750 may comprise a power management integrated circuit (PMIC). The block 1710 may use the supply voltage Vdd to power devices (e.g., transistors) in the block. Thus, the power manager 1720 can adjust (scale) the supply voltage of the block 1710 by adjusting the supply voltage Vdd provided to the block 1710 from the power source 1750.
The power manager 1720 may manage power based on instructions from the temperature manager 230. For example, if the temperature manager 230 makes a determination to perform temperature mitigation, then the temperature manager 230 may instruct the power manager 1720 to mitigate temperature for the block. In response, the power manager 1720 may reduce the frequency and/or supply voltage of the block 1710. Reducing the operating frequency, the supply voltage, or both reduces temperature by reducing the dynamic power dissipation of the block 1710. The power manager 1720 reduces the frequency of the block 1710 by reducing the frequency of the clock signal output by the adjustable clock source 1740 and reduces the supply voltage of the block by reducing the supply voltage output by the adjustable power source 1750.
The power manager 1720 may also manage power based on instructions from the current manager 1640. For example, if the current manager 1640 makes a determination to reduce current to the block 1710, then the current manager 1640 may instruct the power manager 1720 to mitigate current for the block. In response, the power manager may reduce the frequency of the block 1710. This reduces the current drawn by the block 1710 by reducing the switching activity of the block 1710. The power manager 1720 reduces the frequency of the block 1710 by reducing the frequency of the clock signal output by the adjustable clock source 1740.
In step 1810, temperature readings are received from a plurality of temperature sensors on a chip. For example, the temperature readings may be received by the multiplexer 220. Each temperature reading may be in the form of a voltage or a current that is a function of the temperature at the location of the respective one of the temperature sensors (e.g., temperature sensors 211-214).
In step 1820, an average or a sum of the temperature readings from the temperature sensors is determined. For example, the temperature readings may be averaged or summed by the multiplexer 220. In one example, the sum of the temperature readings may be determined by shorting the temperature sensors (e.g., using the multiplexer 520 shown in
In step 1830, a temperature at a location on the chip is computed based on the average or sum of the temperature readings. For example, the temperature at the location may be determined by the temperature manager 230. The location may be located at approximately a centroid of the locations of the temperature sensors, an estimated hotspot location on the chip, or another location on the chip.
In step 1910, sensor readings are received from a plurality of sensors on a chip. The sensors may comprise temperature sensors (e.g., temperature sensors 1211-1219) or current sensors (e.g., current sensors 1611-1614).
In step 1920, a region of the chip is divided into a first region and a second region. For example, the region (e.g., 1210) may be divided into two regions (e.g., regions 1310 and 1320) of approximately equal area.
In step 1930, a first value is computed for the first region based on a first subset of the sensor readings. For example, the first value for the first region (e.g., region 1310) may be computed based on an average or sum of the sensor readings in the first subset (e.g., sensors 1211, 1212, 1213, 1214, 1215 and 1219).
In step 1940, a second value is computed for the second region based on a second subset of the sensor readings. For example, the second value for the second region (e.g., region 1320) may be computed based on an average or sum of the sensor readings in the second subset (e.g., sensors 1211, 1218, 1217, 1216, 1215 and 1219). The first and second subsets of sensor readings may be different.
In step 1950, the first value is compared with the second value.
In step 1960, the search is narrowed to one of the first and second regions corresponding to a highest one of the first and second values. For example, the search may be continued in the region with the highest value by dividing the region into two smaller regions and repeating the above steps for the two smaller regions.
The temperature manager 230, the current manager 1640 and the power manager 1720 may be implemented with one or more processors and one or more memories storing instructions that, when executed by the one or more processors, cause the one or more processors to perform the operations discussed herein. The one or more processors may include general-purpose microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate array (FPGAs), programmable logic devices (PLDs), controllers, state machines, gated logic, discrete hardware components, dedicated hardware finite state machines, or any combination thereof. The one or more memories may be internal to the one or more processors and/or external to the one or more processors. The one or more memories may include any suitable computer-readable media, including RAM, ROM, Flash memory, EEPROM, etc.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.