The present application relates, generally, to assigning processing threads to cores of a multi-core processor and, more specifically, to assigning processing threads based at least in part on distance between processing cores and temperature sensors.
A conventional computing device (e.g., smart phone, tablet computer, etc.) may include a system on chip (SOC), which has a processor and other operational circuits. Specifically, an SOC in a smart phone may include a processor chip within a package, where the package is mounted on a printed circuit board (PCB) internally to the phone. The phone includes an external housing and a display, such as a liquid crystal display (LCD). A human user when using the phone physically touches the external housing and the display.
As the SOC operates, it generates heat. In one example, the SOC within a smart phone may reach temperatures of 80° C.-100° C. Furthermore, conventional smart phones do not include fans to dissipate heat. During use, such as when a human user is watching a video on a smart phone, the SOC generates heat, and the heat is spread through the internal portions of the phone to the outside surface of the phone. Conventional smart phones include algorithms to control both the SOC temperature and the temperature of an outside surface of the phone by reducing a frequency of operation of the SOC when a temperature sensor on the SOC reaches a threshold level.
Demand for more performance in computing devices is increasing. One industry response to this demand has been the addition of more processor cores on an SOC to improve performance. The additional processor cores can provide higher performance, but the increase in processor cores may result in the use of more power, which leads to higher temperatures and shorter battery life. Higher temperatures and shorter battery life negatively impact reliability and user experience.
Regardless of the number of processor cores, most conventional user applications are written so that processing is concentrated in just two cores (e.g., dual processor core intensive), hence adding more processor cores may not directly translate into better user experience/performance. Further, some conventional applications are written to employ the resources of a graphics processing unit (GPU) rather than just relying on a central processing unit (CPU). However, heavy use of a GPU may result in generation of heat that affects surrounding processing units on the SOC, such as cores of the CPU, a modem, a digital signal processor (DSP), and the like. Therefore, there is a need in the art for computing systems employing multiple processing units to address heat generated by one processing unit that affects another processing unit while taking into account a number of cores that may be used by a given application.
Various embodiments are directed to circuits and methods that assign processing threads to queues of cores of a multicore processor based at least in part on a physical distance between the respective cores and a temperature sensor detecting a hot spot. For instance, one example embodiment detects a hot spot at a first processing unit (e.g., a GPU) and places a processing thread in a queue of a core at a second processing unit (e.g., a CPU) based at least in part on a distance between that core and a temperature detector associated with the hot spot.
According to one embodiment, a method includes: generating temperature information from a plurality of temperature sensors within a computing device, wherein a first one of the temperature sensors is physically located at a first processing unit of the computing device; processing the temperature information to identify that the first one of the temperature sensors is associated with temperature that is at or above a threshold; and assigning a processing thread to a first core of a plurality of cores of a second processing unit in response to identifying that the first one of the temperature sensors is associated with temperature that is at or above the threshold and based at least in part on a physical distance between the first core and the first one of the temperature sensors
According to another embodiment, a system includes: a first processing unit configured to execute computer-readable instructions, wherein the first processing unit comprises a plurality of cores; a second processing unit configured to execute computer-readable instructions, wherein the first and the second processing units reside on a same substrate; and a temperature sensing device disposed within the second processing unit to measure a temperature at the second processing unit, wherein processing threads are assigned to one or more of the plurality of cores based, at least in part, on the temperature and a distance between each of the plurality of cores and the second processing unit.
According to another embodiment, a non-transitory computer readable medium having computer-readable instructions stored thereon, wherein the computer-readable instructions when executed by a first processing unit cause the first processing unit to: receive temperature information from a plurality of temperature sensing devices disposed within a semiconductor die including the first processing unit and a second processing unit, wherein a first one of the temperature sensing devices is disposed within the second processing unit; determine from the temperature information that a temperature sensed by the first one of the temperature sensing devices is above a threshold; and in response to determining that the temperature is above the threshold, assign a processing thread to either a first core of the first processing unit or a second core of the first processing unit based at least in part on respective distances of the first core and second core from the first one of the temperature sensing devices
According to another embodiment, a computing device implemented on a semiconductor die, the computing device includes: first means for executing processing threads, wherein the first means includes a multi-core processing unit; second means for executing processing threads; means for sensing temperature at the second means; means for determining that a temperature sensed by the temperature sensing means exceeds a threshold; and means for assigning a processing thread to a first core of the first means in response to determining that the temperature exceeds the threshold and based at least in part on a physical distance within the semiconductor die of the first core to the temperature sensing means.
Various embodiments provided herein include systems and methods to schedule cores in a first processing unit (e.g., a CPU) in response to temperature measurements and physical distance from a second processing unit (e.g., a GPU).
In one embodiment, an SOC may include a variety of different processing units, such as a CPU, a GPU, a DSP, a modem, and the like. Each of the different processing units may include one or more temperature sensors that measure temperature and provide that temperature information to a control system of the chip. For example, the control system of the chip may include one or more algorithms as part of a kernel or even higher up in an operating system stack. One of those algorithms may include a core scheduler, which assigns threads to cores of the CPU.
As an application is run in the system, the core scheduler determines cores of the CPU to handle individual ones of the threads. The core scheduler may use any of a multitude of criteria to prioritize cores to receive threads, such as core temperature, capabilities of the core, and the like. In one embodiment, the core scheduler takes into account a temperature reading at another processing unit, such as the GPU, and physical distance on the chip between the measured hot spot of the other processing unit and individual ones of the cores. It is generally assumed in this example that a larger physical distance between an individual core and a hot spot on the other processing unit would correlate with lower thermal effects at that particular core attributable to the hot spot. Of course, other factors may come into account, such as temperature of an individual core itself. The core scheduler assigns threads to an individual core based at least in part on physical distance between that core and the detected hot spot.
Continuing with the example, the SOC includes a storage device (e.g., non-volatile memory, such as flash memory) to store a table that relates physical distance of individual cores to a particular hot spot. For example, the table may include an entry for a particular temperature sensor and fields associated with that entry to indicate a core that is farthest from the temperature sensor, a core that is second farthest from the temperature sensor, a core that is third farthest from the temperature sensor, and on and on. As the scheduler receives new threads to assign or performs a periodic rebalancing, it reads information from a variety of different temperature sensors, including sensors at processing devices other than the particular multi-core processing device. If the scheduler detects a particular hot spot, then the scheduler may consult the table and assign one or more threads to a core that is indicated by the table as being physically remote from the detected hot spot.
Various embodiments may be performed by hardware and/or software in a computing device. For instance, some embodiments include hardware and/or software algorithms performed by a processor, which can be part of an SOC, in a computing device as the device operates. Various embodiments may further include nonvolatile or volatile memory set aside in an integrated circuit chip in a computing device to store the tables correlating physical core distance with respect to multiple cores and multiple temperature sensors.
Although not shown in
CPU 310 is a separate processing unit from GPU 320 and separate from the DSPs 340 and 350. Furthermore, CPU 310 is physically separate from GPU 320 and from the DSPs 340, 350, as indicated by the space between those components in the illustration of
Further in this example, CPU 310 executes computer readable code to provide the functionality of a CPU scheduler. For instance, in this example the CPU scheduler includes firmware that is executed by one or more of the cores of CPU 310 as part of an operating system kernel. Of course, various embodiments may implement a CPU scheduler in other appropriate ways, such as part of a higher-level component of an operating system stack. Operation of the CPU scheduler is explained in more detail below.
The placement of the components on the SOC 300 may have an effect on the performance of the components, particularly their operating temperatures. When the SOC 300 is operational, the various components 310-380 generate heat, where that heat dissipates through the material of the semiconductor die. The operating temperature of a component may be affected by its own power dissipation (self-heating) and the temperature influence of surrounding components (mutual-heating). A mutual heating component may include anything on the SOC 300 that produces heat. Thus, the operating temperature of each component on the SOC 300 may depend on its placement with respect to heat sinks and to the other components on the SOC 300 generating heat. For example, the CPU 310 and the GPU 320 may both generate significant heat when a graphics-intensive application is executing. Where these components are placed close together, one may cause the performance of the other to suffer due to the heat it produces during operation. Thus, as shown in
CPU 310 in this example also includes thermal mitigation algorithms, which measure temperature throughout the SOC 300 and may reduce an operating voltage or an operating frequency of one or more components in order to reduce heat generated by such components when a temperature sensor indicates a hot spot. Accordingly, SOC 300 includes temperature sensors located throughout. Example temperature sensors are shown labeled TJ1-TJ6. Temperature sensors TJ1 and TJ2 are implemented within GPU 320, whereas the temperature sensors labeled TJ3-TJ6 are implemented within CPU 310. The scope of embodiments is not limited to any particular placement for the temperature sensors, and other embodiments may include more or fewer temperature sensors and temperature sensors in different places. For instance, other embodiments may include temperature sensors at any of components 330-380, on a PCB, or other appropriate location. The temperature sensors themselves may include any appropriate sensing device, such as a ring oscillator.
TJ stands for junction temperature, and at any given time a junction temperature refers to a highest temperature reading by any of the sensors. For instance, if the temperature sensor TJ2 reads the highest temperature out of the six temperature sensors, then the value of that temperature reading is the junction temperature. As SOC 300 operates, the junction temperature may change, and the particular sensor reading the junction temperature may change.
In this example, CPU 310 provides functionality to control the heat produced within SOC 300 by temperature mitigation algorithms, which monitor the temperatures at the various sensors, including a junction temperature, and take appropriate action. For instance, one or more temperature mitigation algorithms may track the temperatures at the temperature sensors and reduce a voltage and/or a frequency of operation of any one of the components 310-380, or even an individual CPU core, when the junction temperature meets or exceeds one or more set points or thresholds. Additionally, in the embodiment of
During normal operation of the computing device 100 (
In one example operation, the CPU scheduler takes into account physical distance from a detected hot spot by consulting a table that includes fields that correlate the temperature sensor associated with the detected hot spot with respective physical distances to the various cores. An example is shown in
Continuing with the operational example, the CPU scheduler is tasked with placing a particular processing thread with a CPU core. If the CPU scheduler detects a hot spot that corresponds to either one of the temperature sensors TJ1 or TJ2, the CPU scheduler may then access Table 400, parse the contents to identify the particular temperature sensor associated with the hot spot and determine relative physical placements of the cores with respect to the temperature sensor. The CPU scheduler may further rank the cores based on relative physical distance, ranking Core 0 the highest and Core 2 the lowest with respect to this particular criterion. Of course, the CPU scheduler may take into account other criteria as well. However, assuming that no other criteria overrule the physical distance from the detected hot spot, the CPU scheduler then assigns the processing thread to Core 0. In some examples, applications are written to execute on two cores of a CPU, and in such an example the CPU scheduler may assign the first processing thread to Core 0 and then assign a processing thread of the same application to Core 3 because Core 3 is the second furthest CPU core.
In various embodiments, the SOC 300 stores Table 400 in nonvolatile memory that is available to the various processing units 310-380, or at least available to CPU 310 that is executing a kernel or other operating system functionality. The CPU scheduler is programmed to access an address in the nonvolatile memory that corresponds to Table 400 when appropriate. Table 400 may be written to the nonvolatile memory during manufacture of the computing device 100 or even following manufacture of SOC 300 but before manufacture of computing device 100 itself. Specifically, the information in Table 400 is known from the design phase of the SOC 300 and thus may be written to the nonvolatile memory as early or as late as is practicable.
Various embodiments may include one or more advantages over conventional systems. For instance, various conventional systems rely more heavily on a CPU processing unit than on a GPU processing unit. Thus, a hot spot or junction temperature was more likely to occur at the CPU processing unit in such conventional systems. However, applications more recently have begun to use enough processing power of the GPU that a GPU may generate enough heat to result in a junction temperature from time to time. And while some conventional systems were capable of taking into account temperatures within the CPU processing unit when assigning processing threads in a CPU core, such conventional systems were not capable of taking into account temperatures of neighboring processing units.
By contrast, various embodiments described herein take into account a temperature of a neighboring processing unit when scheduling threads to a core in a another processing unit. For instance, in the examples of
In a system that includes temperature mitigation algorithms, a temperature mitigation algorithm may reduce operating voltage or operating frequency for a particular processing core or an entire processing unit in response to detected temperature or temperature increases rising above a predetermined limit. Thus, a processor core closest to a heat-generating processing unit would be expected to have a shorter time to mitigation and resulting lower performance Various embodiments may increase time to mitigation for the various processor cores by reducing temperature or temperature increases from neighboring processing units.
A flow diagram of an example method 500 for scheduling processing threads among the cores of a multi-core processing unit is illustrated in
The embodiment of
In another example, the CPU scheduler performs a load balancing operation to spread processing threads among the available cores to optimize efficiency. Such load balancing may be performed at regular intervals, e.g., every 50 ms. Of course, the scope of embodiments is not limited to any particular interval for performing load balancing. In these examples, method 500 may be performed at the regular interval for load balancing and also may be performed between the load balancing intervals as new processing threads are received by the CPU scheduler as a result of new applications opening up or new media being consumed.
At action 510, the CPU scheduler reads temperature sensing data from the temperature sensors at the integrated circuit chip, such as SOC 300. Examples are shown above at
At action 520, the CPU scheduler determines whether a temperature reading (“T”) is above a programmed threshold (Tthreshold). If there are no temperature readings above the threshold, the method 500 moves to action 550 (described later). However, if a hot spot is detected by determining that a temperature reading is above the threshold, then the CPU scheduler moves to action 530.
In this example, a hot spot includes a physical location corresponding to a temperature sensor that is sensing a temperature the same as or greater than the threshold. The hot spot temperature may be a calculated value based on temperature sensor reading and, in some embodiments, may also include an offset temperature delta to account for actual hot spot temperature on silicon to temp sensor location. At action 530, the CPU scheduler determines whether the hot spot is inside or is outside the CPU. For instance, various embodiments may include a table or other data structure associating temperature sensors with processing units. Action 530 may include consulting such table to determine where the hotspot is located. If the hot spot is inside the CPU, then the CPU scheduler proceeds to action 550 by placing the processing thread in a queue of a core selected according to various criteria, such as quiescent current (Iddq), temperatures of respective cores (e.g., by placing a processing thread at a core having a lowest temperature among the various cores), location of core within the CPU itself, and/or the like.
However, if it is determined at action 530 that the hot spot is outside of the CPU, then the CPU scheduler moves to action 540. An example of determining that the hot spot is outside of the CPU includes measuring a temperature at one of the temperature sensors of the GPU 320 of
As noted above with respect to
Method 500 continues, as the CPU scheduler continually receives temperature sensing data and also either places new threads or rebalances threads. Accordingly, normal operation of SOC 300 may include repeating method 500 as new threads are received or load-balancing operations are performed and until the device is powered off.
Action 610 includes generating temperature information from a plurality of temperature sensors within a computing device. An example is shown at
At action 620, the core scheduling algorithm processes the temperature information to identify that a first one of the temperature sensors is associated with temperature that is at or above a threshold. In some examples, a temperature threshold for a processing unit of an SOC may be 100° C., although the scope of embodiments may include any appropriate threshold temperature. The core scheduling algorithm compares the temperature information to the threshold and identifies a hot spot from the comparison. Action 620 may further include processing the temperature information to identify that other ones of the temperature sensors are not associated with temperatures at or above the threshold.
At action 630, the core scheduling algorithm accesses a data structure, such as Table 400 of
At action 640, the core scheduling algorithm determines, from the table, that a particular core is a furthest one of the plurality of cores from the first temperature sensor. An example is shown at
At action 650, the core scheduling algorithm places the thread in a queue of the particular core in response to determining that the particular core is the furthest one of the plurality of cores from the first temperature sensor.
As the device operates during normal use, the core scheduling algorithm may continue to run, taking appropriate action as temperatures rise and fall and as threads are assigned or rebalanced.
The scope of embodiments is not limited to the specific method shown in
Also, in some embodiments the particular thread is associated with an application that includes other threads, and the application is programmed to use a certain subset of the cores (e.g., two of the cores). Accordingly, the core scheduling algorithm may place an additional thread from the same application with a different core, where the different core and the first core are grouped as the cores on which the application is processed. Therefore, the core scheduling algorithm may place the additional thread on the other core based at least in part on a distance of the other core to the hot spot and/or based at least in part on a distance of the first core to the hot spot.
As those of some skill in this art will by now appreciate and depending on the particular application at hand, many modifications, substitutions and variations can be made in and to the materials, apparatus, configurations and methods of use of the devices of the present disclosure without departing from the spirit and scope thereof. In light of this, the scope of the present disclosure should not be limited to that of the particular embodiments illustrated and described herein, as they are merely by way of some examples thereof, but rather, should be fully commensurate with that of the claims appended hereafter and their functional equivalents.
The present application claims the benefit of U.S. Provisional Patent Application No. 62/423,805, filed Nov. 18, 2016, and entitled “Circuits and Methods Providing Thread Assignment for a Multi-Core Processor,” the disclosure of which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
62423805 | Nov 2016 | US |