This disclosure relates generally to thermal management of computing hardware and, more particularly, to methods and apparatus for in-field thermal calibration.
In recent years, with semiconductor fabrication technology advances, transistor sizes have become relatively smaller. Accordingly, packages as system on chip (SOC) packages have smaller footprints and overall thicknesses. The reduced size of the SOC packages can improve performance and power efficiency, thereby resulting in an improved user experience. However, these advantages can come at a cost of a higher thermal density. In particular, an SOC package can experience a relatively higher rate of temperature change under certain conditions compared to past SOC implementations.
In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts. The figures are not to scale. Unless specifically stated otherwise, descriptors such as “first,” “second,” “third,” etc., are used herein without imputing or otherwise indicating any meaning of priority, physical order, arrangement in a list, and/or ordering in any way, but are merely used as labels and/or arbitrary names to distinguish elements for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for identifying those elements distinctly that might, for example, otherwise share a same name.
As used herein, the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events.
As used herein, “processor circuitry” is defined to include (i) one or more special purpose electrical circuits structured to perform specific operation(s) and including one or more semiconductor-based logic devices (e.g., electrical hardware implemented by one or more transistors), and/or (ii) one or more general purpose semiconductor-based electrical circuits programmed with instructions to perform specific operations and including one or more semiconductor-based logic devices (e.g., electrical hardware implemented by one or more transistors). Examples of processor circuitry include programmed microprocessors, Field Programmable Gate Arrays (FPGAs) that may instantiate instructions, Central Processor Units (CPUs), Graphics Processor Units (GPUs), Digital Signal Processors (DSPs), XPUs, or microcontrollers and integrated circuits such as Application Specific Integrated Circuits (ASICs). For example, an XPU may be implemented by a heterogeneous computing system including multiple types of processor circuitry (e.g., one or more FPGAs, one or more CPUs, one or more GPUs, one or more DSPs, etc., and/or a combination thereof) and application programming interface(s) (API(s)) that may assign computing task(s) to whichever one(s) of the multiple types of the processing circuitry is/are best suited to execute the computing task(s).
Methods and apparatus for in-field thermal calibration are disclosed. Semiconductor packages, such as system on chip (SOC) packages, have become relatively smaller and thinner. However, those advantages come at the cost of a higher thermal density and, as a result, an SOC package could experience a relatively higher rate of temperature change under certain conditions. For example, it has been observed that a temperature rise of 20° C. can occur within a 1 millisecond (ms) window, which is approximately 20 times faster compared to prior SOC implementations.
A relatively high rate of temperature change imposes several implications on an SOC. For example, to ensure functionality and prevent damage thereto, the SOC is typically prevented from allowing its temperature to exceed certain limits, such as a junction temperature limit, for example. To this end, in known systems, a thermal guard band is typically employed in SOC thermal management algorithms. With these thermal guard bands, the SOC can prevent its temperature from going beyond its limits, thereby preventing damage or malfunction of the SOC. However, excessive or conservative limits of a guard band can adversely impact performance due to a lack of utilization of a thermal budget of the SOC. In particular, the limits can result in the SOC limiting its clock frequency more than necessary and, thus, providing relatively less performance. Conversely, the SOC can operate at a frequency that exceeds its thermal budget due to an inaccurate guard band, thereby potentially reducing an operating life thereof.
An additional aspect that is often overlooked with known systems is that thermal performance can diminish over time in the field. In the field, this overall reduction of thermal performance can be caused by a degradation of a thermal interface material positioned/layered between a processor/SOC die and a heat sink, for example. Moreover, fouling and/or clogging of heatsink fins can also result in diminished thermal performance. For instances where the thermal interface material degrades, a transient temperature response of the SOC can be adversely affected. In some known systems, approximately two years of operation can significantly decrease a thermal performance of the SOC and/or hardware associated with the SOC.
Examples disclosed herein enable effective adjustment of thermal settings of an SOC to account for variations in assembly, components, material, assembly and/or tolerances. Examples disclosed herein are able to uniquely characterize individual SOCs and/or associated hardware to accurately control operation of the SOC to individual or specific hardware implementations in the field. Accordingly, examples disclosed herein can more effectively utilize performance capabilities of the SOC by operating the SOC closer to a thermal overhead associated with the specific SOC. In contrast, knowns systems typically utilize a default thermal model that is based on overly conservative thermal characteristics and/or scenarios (e.g., worst-case scenarios). Examples disclosed herein can mitigate and/or adjust to the effects of hardware degradation over an operational life of the SOC.
Examples disclosed herein train a thermal model (e.g., a thermal transient model) of an SOC in three phases: (i) a monitoring phase, (ii) a calibration phase, and (iii) a publishing phase. According to examples disclosed herein, the model is trained to a specific instance, installation, part/component variation and/or assembly of the SOC so that performance of the SOC can be efficiently utilized. In other words, examples disclosed herein tailor performance of an SOC specific to unique and/or individual aspects of the SOC and/or hardware associated with the SOC (e.g., thermal devices, component variation, tolerance stack variation, housing configuration variation, air flow configuration variation, etc.).
Examples disclosed herein determine that a computing device and/or an SOC of the computing device is deployed in the field. According to examples disclosed herein, the SOC is deployed with a default first thermal model (e.g., a default thermal transient model), which includes conservative frequency setpoints. In response to the determination that the SOC is deployed, at least one temperature from a sensor (e.g., an on-die sensor) is monitored and associated with power usage during a time interval. In turn, a second thermal model is calibrated based on the at least one temperature and the power usage. In some examples, the second thermal model is calibrated by replacing and/or substituting data points of the first thermal model. In other words, the second thermal model can be a hybrid of data of the first thermal model (and its associated curves/data) and data obtained from the monitoring phase, for example. Accordingly, the second thermal model can be a modified version of the first thermal model. In some other examples, the second thermal model is generated as a new table or curve based on data points from the aforementioned monitoring phase (e.g., independent of the default first thermal model).
In some examples, the second thermal model is re-trained and/or newly generated after a time duration of the SOC being operated in the field (e.g., to define a third thermal model). In some examples, in addition to replacing some data points from the first thermal model, data points are added to the second thermal model to more accurately characterize the SOC. In some such examples, the data points are added until the second thermal model has converged and/or includes enough data points to approximate thermal transient characteristics of the computing device with a requisite degree of accuracy. In some examples, the sensor includes an on-die sensor of the SOC. In some such examples, the on-die sensor is associated with a computing unit (e.g., a core) of the SOC.
As used herein, the term “thermal model” refers to data that reflects assumed or measured thermal parameters and/or behavior associated with a device, component, assembly and/or system. Accordingly, the term “thermal model” can refer to a table, an array, a curve, an equation, etc. As used herein, the term “field” refers to an environment at which a computing device is to be operated during its operational life. Accordingly, the term “field” can refer to consumer and/or commercial use, for example, of the computing device as the computing device is used during its operational life. As used herein, the term “deployed” refers to a device and/or system being utilized during its operational life. As used herein, stating that a first object or item is “unitary” with a second object or item means that the first object is at least a source for the second object. Accordingly, stating that the first object is “unitary” with the second object does not necessitate that the first object and the second object are always identical (e.g., the second object can be a modified or updated version of the first object and vice-versa).
In this example, the SOC 102 includes multiple functional computing units 110, which are implemented as processor cores (e.g., logical cores, processing cores, compute cores, computing cores, etc.) in this example, and may include sensors (e.g., on-die sensors) 112. The sensors 112 can be associated with the SOC 102, at least one of the computing units 110, the motherboard 103 and/or any appropriate associated hardware of the computing device 101. In examples disclosed herein, the computing units 110 are further referred to as cores 110 for clarity. However, the computing units 110 do not necessarily have to be implemented by cores. For example, the computing units could be implemented by logical performance units of cache, memory, a bus controller, etc. In this example, the memory 104 is utilized for executing at least one thread of software 114 and/or firmware 116 by the SOC 102. The software 114 and/or the firmware 116 can be stored in a storage 120 of the devices 106. The devices 106 can include hardware and/or peripherals associated with and/or included by the computing device 101. In this example, interfaces 122 communicatively coupled the SOC 102 to the devices 106 and the memory 104.
To accurately control a frequency and/or performance of the example SOC 102, examples disclosed herein implement a calibrated thermal model to adjust operation and/or control of the SOC 102. In particular, a thermal model (e.g., a thermal transient model) that corresponds to specific characteristics of the individual SOC 102 is calibrated to account for numerous parameters including, but not limited to, SOC performance variation, SOC component variation, processing variation, dimensional/size variation, cooling component differences, cooling configurations, computing device aspects, etc. This calibrated model can be based on a default thermal model that is initially used for control of the SOC 102. Upon calibration of the calibrated thermal model, the default thermal model is no longer utilized and a frequency of the SOC 102 and/or least one of the cores 110 is controlled based on the calibrated thermal model.
In the illustrated example of
In this example, when calibration of the thermal model is completed, the calibrated thermal model is published for use by the SOC 102. In the illustrated example, the calibrated thermal model is published by replacing the default calibrated thermal model originally provided with the SOC 102 and/or the computing device 101 with the calibrated thermal model in operation. As a result, the SOC 102 is controlled with a thermal guard band that is relatively more accurate to individual conditions, variations and/or tolerances thereof. In other words, the example SOC 102 is operated and/or controlled based on individualized thermal parameters, performance and/or conditions.
In some examples, the computing device 101 is communicatively coupled to and/or part of a network 130. In some such examples, the calibrated thermal model can be transferred from the computing device 101 (e.g., for subsequent development and/or reiteration of further default thermal models). Additionally or alternatively, software that directs the computing device 101 to calibrate and/or generate thermal models is received at the computing device 101 from the network 130. However, any other appropriate computing and/or network topology can be implemented instead.
While the cores 110 are implemented as processor cores in this example, the cores 110 can be implemented as individual computing units of memory (e.g., random access memory, cache memory, a network controller, etc.), a device controller (e.g., a hard disk controller, a memory controller, a cache controller, etc.) or any other appropriate device that manages performance of discrete and/or individual computing units. In other words, examples disclosed herein are not solely limited to SOCs and/or SOC packages.
While the example of
In operation, during the aforementioned run-time frequency management stage 204, a frequency setpoint and/or setting of the SOC is determined. In turn, a power corresponding to the frequency setpoint is estimated. Further, a temperature and/or temperature change (e.g., a temperature delta) corresponding to the aforementioned power is determined, estimated based on a default thermal model 206 having a default thermal characterization. In particular, the power is utilized to estimate the temperature based on the power. The estimated temperature is, in turn, used to determine whether to throttle the SOC. If the SOC is to be throttled, the SOC is operated at a resolved frequency that is throttled from an initial frequency. If the SOC is not to be throttled, the resolved frequency is equal to the initial frequency. In contrast to examples disclosed herein, use of the default thermal model 206 often results in overly conservative operation and/or frequency control of the SOC such that the SOC is prevented from reaching its full potential. In particular, the default thermal model 206 often corresponds to a worst-case scenario for the SOC (e.g., a worst-case scenario for cooling, hardware differences, anticipated degradation, etc.).
deltaT=Pactual*(Tf−Ts)/Peval (1)
where Tf: Final temperature during the evaluation window, Ts: Starting temperature during a time interval or evaluation window, Peval: Power input during the evaluation window, Pactual: power increase during a runtime window, and deltaT: Predicted temperature change during the evaluation window.
To collect data points (during monitoring) required for calibration and/or generation of the second thermal model 311, the sensor 112 of
According to the illustrated example of
The in-field calibration system 400 of the illustrated example includes example frequency controller circuitry 401, example monitoring analyzer circuitry 402, example power calculator circuitry 404, example thermal model generator circuitry 406, example calibrator circuitry 408 and example publisher circuitry 410. In the illustrated example, the field calibration system 400 is implemented by and/or is communicatively coupled to the SOC 102 and/or the sensor 112 of the computing device 101 shown in
The example frequency controller circuitry 401 is implemented to control a frequency of the SOC 102 and/or at least one of the cores 110 based on a thermal calibration model. In the illustrated example, the frequency controller circuitry 401 controls the frequency with a default first thermal model until a second thermal model, which may be unitary with the first thermal model, is calibrated and published. In some examples, the frequency controller 401 selects a frequency of the SOC 102 from a table and/or an array, for example.
In the illustrated example of
The example power calculator circuitry 404 calculates and/or estimates an estimated power usage of the SOC 102. In particular, the power calculator circuitry 404 calculates and/or estimates the power usage based on a current draw of the SOC 102 in combination with a voltage associated with the SOC 102. However, any other appropriate measurement or calculation of the power can be implemented instead. In this example, the power calculated by the power calculator circuitry 404 is utilized as an input for a thermal model in use (i.e., either a default thermal model, a partially calibrated thermal model or a fully calibrated thermal model).
In some examples, the thermal model generator circuitry 406 of the illustrated example generates the second thermal model based on data points obtained from monitoring the SOC 102 with the example monitoring analyzer circuitry 402. In some such examples, the second thermal model is fully generated by the thermal model generator circuitry 406 and can be represented by a curve that relates temperature (e.g., a temperature increase, a temperature delta, etc.) to power and/or current. In other words, the second thermal model can be fully generated independent of the first thermal model, in some examples.
In this example, the calibrator circuitry 408 calibrates and/or develops the second thermal model (e.g., as a curve or table associated with the aforementioned data points). The example calibrator circuitry 408 can adjust and/or revise at least one data point of the first thermal model based on monitoring of the SOC 102 to yield the second thermal model. In particular examples, the calibrator circuitry 408 adds and/or replaces data points of the first thermal model with data points form monitoring the SOC 102. In other words, the second thermal model is generated by adding, replacing and/or substituting points pertaining to the default first thermal model, thereby defining a hybrid of the original data points and updated/newer data points obtained from monitoring the SOC, for example. In some examples, the calibrator circuitry 408 determines whether the second thermal model is ready (e.g., fully calibrated, converged, etc.) for in-field operation and, thus, publishing for that purpose. In some examples, the second thermal model is continuously updated (e.g., data points are constantly revised and/or added during in-field operation of the SOC 102) by the calibrator circuitry 408. In some such examples, the second thermal model is utilized by the SOC 102 while being partially calibrated (e.g., the second thermal model is partially calibrated version of the first thermal model).
The example publisher circuitry 410 publishes and/or enables utilization of the calibrated second thermal model by the SOC 102 and/or at least one of the cores 110. The publication may occur in response to the determination that the second thermal model is ready for utilization by the SOC 102. In some examples, the publisher circuitry 410 determines whether an updated or newer thermal model (e.g., a third thermal model) is necessitated (e.g., during an in-field operational life of the computing device 101). This determination may be based on whether the second thermal model has sufficient convergence, whether sufficient data points are obtained for calibration of the second thermal model, whether an elapsed time in the field has occurred (e.g., six months to two years after an initial field deployment of the computing device 101), a significant shift in thermal performance of the computing device 101 and/or the SOC 102 has been initiated, etc.
As can be seen in this example, a set of discrete points approximates an actual curve represented by the equation (1) mentioned above in connection with
Turning to
According to examples disclosed herein, for monitoring (and to obtain data points for calibration), there are two approaches for data collection: an active approach, and a passive approach. In the active approach, software provides identified workloads (e.g., sufficient to cover desired data points periodically during the calibration phase, workloads designed only to cause the SOC 102 to operate without any usable program output), for example. In some examples, performance of the workload will be repeated multiple times to account for errors and/or variation (e.g., run to run errors).
In the aforementioned example passive approach, an on-die monitor or sensor samples data during normal/routine operation (e.g., user-directed control). Accordingly, this example approach can have a dependency on actual system operation to finalize values of each of those discrete points. Accordingly, some of the discrete points may not be calibrated specifically.
Turning to
In some examples, to ensure a relatively high quality and accurate calibration of the second thermal model, and eliminate impacts from run-to-run error, in some examples, for each data point, there is a minimum/requisite number of samples that need to be evaluated prior to completion of calibration for each of the data points. In other words, multiple samples for each data point can improve an accuracy of the second thermal model.
In this example, calibration of each of the discrete data points shown in
An example approach depicted in the following pseudo code is proposed in examples disclosed herein:
In this example, there are two steps. In a first example step, default settings are replaced with the calibrated settings, if applicable (e.g., the data point is calibrated). In the second example step, over-pessimistic data points are filtered out due to being uncalibrated. This filtering can occur during calibration when there is insufficient data to calibrate each data point and, hence, some data points remain at their default values (e.g., originally provided values with the first thermal model), for example. Given the default values are often related to a predicted worst case scenario, for example, it is probable that some default values are greater than what their true actual values. This can lead to instances where some data points have a greater value than a value of a data point on its right. Given an assumption that the curve should increase monotonically as power increases, the over-pessimistic points may be replaced with the values from adjacent greater value data points, as shown in the example step-2 of the pseudo code above.
Once the second thermal model is calibrated, as shown in
While an example manner of implementing the in-field calibration system 400 of
Flowcharts representative of example hardware logic circuitry, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the in-field calibration system 400 of
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine readable instructions as described herein may be stored as data or a data structure (e.g., as portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers) located at the same or different locations of a network or collection of networks (e.g., in the cloud, in edge devices, etc.). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc., in order to make them directly readable, interpretable, and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and/or stored on separate computing devices, wherein the parts when decrypted, decompressed, and/or combined form a set of machine executable instructions that implement one or more operations that may together form a program such as that described herein.
In another example, the machine readable instructions may be stored in a state in which they may be read by processor circuitry, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc., in order to execute the machine readable instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, machine readable media, as used herein, may include machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.
The machine readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine readable instructions may be represented using any of the following languages: C, C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.
As mentioned above, the example operations of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc., may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, or (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B.
As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” object, as used herein, refers to one or more of that object. The terms “a” (or “an”), “one or more”, and “at least one” are used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., the same entity or object. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.
At block 604, the example monitoring analyzer circuitry 402 determines whether the SOC 102 and/or the computing device 101 is deployed in the field. In some examples, this determination is based on whether user applications are being run and/or whether a user has indicated that the computing device 101 is in the field (e.g., whether the computing device 101 has entered an operational phase). In some examples, the monitoring analyzer circuitry 402 determines that the computing device 101 and/or the SOC 102 is deployed when the computing device 101 is first turn on in the field subsequent to production.
In the illustrated example of
At block 608, as will be discussed in greater detail below in connection with
At block 610, as will be discussed in greater detail below in connection with
At block 612, in some examples, the calibrator circuitry 408 and/or the publisher circuitry 410 determines whether to re-adjust the second thermal model. If the second thermal model is to be re-adjusted (block 612), control of the process returns to block 606. Otherwise, the process ends. The determination can be based on whether the computing device 101 has been operated for a requisite amount of time, whether thermal properties and/or performance of the computing device 101 and/or the SOC 102 has changed significantly.
At block 702, the frequency controller 401 and/or the example power calculator circuitry 702 controls a frequency and/or a power setting of the SOC 102 and/or at least one of the cores 110. In some examples, the frequency setting is based on a pre-designated workload that operates the SOC 102 at frequency ranges intended to provide a sufficient range of data points associated with the SOC 102 corresponding to numerous different frequencies and/or frequency settings.
At block 704, the example monitoring analyzer circuitry 402 determines and/or measures a beginning or start temperature of the SOC 102 and/or the core 110 measured by the sensor 112 at the start of a time interval or evaluation window (e.g., the start temperature is measured as the SOC 102 begins to execute instructions and/or a workload).
At block 706, the example power calculator circuitry 404 determines a power usage of the SOC 102 and/or at least one of the cores 110. In this example, the power calculator circuitry 404 utilizes a voltage and drawn current of the SOC 102 to determine the power usage during the time interval.
At block 708, in some examples, the power calculator circuitry 404 determines an increase in power of the SOC 102. In this example the increase in power occurs over the aforementioned time interval.
At block 710, the power calculator circuitry 404 of the illustrated example determines a final temperature associated with the SOC 102. In this example, the final temperature corresponds to a temperature measured by the sensor 112 at an end of the time interval.
At block 712, it is determined whether additional data points are to be obtained to refine and/or re-define the calibrated second thermal model. If additional points are to be obtained (block 712) control of the process returns to block 702. Otherwise, the process ends/returns.
At block 802, the monitoring analyzer circuitry 402 of the illustrated example selects and/or obtains data points acquired during monitoring of the SOC 102.
At block 804, the example calibrator circuitry 408 applies data (e.g., in the form of added or updated data points) to define and/or augment the second thermal model and/or a curve associated with the second thermal model, such as the examples shown and described above in connection with
At block 806, in some examples, the example calibrator circuitry 408 determines a fit of the aforementioned curve. In such examples, the calibrator circuitry 408 may fit an equation to the curve (e.g., the curve is approximated via a polynomial fit or a linear regression of a portion thereof).
At block 808, in some examples, the calibrator circuitry 408 determines a fit of the equation or other representation of the second thermal model relative to data points of the second thermal model. The fit may correspond to a degree to which the equation correlates to the data points of the second thermal model (e.g., similar to that of a linear regression fit value).
At block 810, it is determined by the calibrator circuitry 408 whether to repeat the process. If the process is to be repeated (block 810), control of the process proceeds to “A”, which corresponds to block 606.
At block 902, the example publisher circuitry 410 replaces the default first thermal model with the calibrated second thermal model. In the illustrated example, the second thermal model is originally based on the first thermal model such that data points of the first thermal model have been replaced with newer data points corresponding to monitoring of the individual SOC 102 to define the second thermal model.
At block 904, in some examples, the curve associated with the second thermal model is rebuilt and/or modified by the calibrator circuitry 408. For example, data points of the second thermal model may be filtered and/or removed.
At block 906, in some examples, the curve is verified and the process of
The processor platform 1000 of the illustrated example includes processor circuitry 1012. The processor circuitry 1012 of the illustrated example is hardware. For example, the processor circuitry 1012 can be implemented by one or more integrated circuits, logic circuits, FPGAs, microprocessors, CPUs, GPUs, DSPs, and/or microcontrollers from any desired family or manufacturer. The processor circuitry 1012 may be implemented by one or more semiconductor based (e.g., silicon based) devices. In this example, the processor circuitry 1012 implements the example frequency controller circuitry 401, the example monitoring analyzer circuitry 402, the example power calculator circuitry 404, the example thermal model generator circuitry 406, the example calibrator circuitry 408, and the example publisher circuitry 410.
The processor circuitry 1012 of the illustrated example includes a local memory 1013 (e.g., a cache, registers, etc.). The processor circuitry 1012 of the illustrated example is in communication with a main memory including a volatile memory 1014 and a non-volatile memory 1016 by a bus 1018. The volatile memory 1014 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®), and/or any other type of RAM device. The non-volatile memory 1016 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1014, 1016 of the illustrated example is controlled by a memory controller 1017.
The processor platform 1000 of the illustrated example also includes interface circuitry 1020. The interface circuitry 1020 may be implemented by hardware in accordance with any type of interface standard, such as an Ethernet interface, a universal serial bus (USB) interface, a Bluetooth® interface, a near field communication (NFC) interface, a Peripheral Component Interconnect (PCI) interface, and/or a Peripheral Component Interconnect Express (PCIe) interface.
In the illustrated example, one or more input devices 1022 are connected to the interface circuitry 1020. The input device(s) 1022 permit(s) a user to enter data and/or commands into the processor circuitry 1012. The input device(s) 1022 can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, an isopoint device, and/or a voice recognition system.
One or more output devices 1024 are also connected to the interface circuitry 1020 of the illustrated example. The output device(s) 1024 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube (CRT) display, an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer, and/or speaker. The interface circuitry 1020 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip, and/or graphics processor circuitry such as a GPU.
The interface circuitry 1020 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) by a network 1026. The communication can be by, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, an optical connection, etc.
The processor platform 1000 of the illustrated example also includes one or more mass storage devices 1028 to store software and/or data. Examples of such mass storage devices 1028 include magnetic storage devices, optical storage devices, floppy disk drives, HDDs, CDs, Blu-ray disk drives, redundant array of independent disks (RAID) systems, solid state storage devices such as flash memory devices and/or SSDs, and DVD drives.
The machine executable instructions 1032, which may be implemented by the machine readable instructions of
The cores 1102 may communicate by a first example bus 1104. In some examples, the first bus 1104 may implement a communication bus to effectuate communication associated with one(s) of the cores 1102. For example, the first bus 1104 may implement at least one of an Inter-Integrated Circuit (I2C) bus, a Serial Peripheral Interface (SPI) bus, a PCI bus, or a PCIe bus. Additionally or alternatively, the first bus 1104 may implement any other type of computing or electrical bus. The cores 1102 may obtain data, instructions, and/or signals from one or more external devices by example interface circuitry 1106. The cores 1102 may output data, instructions, and/or signals to the one or more external devices by the interface circuitry 1106. Although the cores 1102 of this example include example local memory 1120 (e.g., Level 1 (L1) cache that may be split into an L1 data cache and an L1 instruction cache), the microprocessor 1100 also includes example shared memory 1110 that may be shared by the cores (e.g., Level 2 (L2 cache)) for high-speed access to data and/or instructions. Data and/or instructions may be transferred (e.g., shared) by writing to and/or reading from the shared memory 1110. The local memory 1120 of each of the cores 1102 and the shared memory 1110 may be part of a hierarchy of storage devices including multiple levels of cache memory and the main memory (e.g., the main memory 1014, 1016 of
Each core 1102 may be referred to as a CPU, DSP, GPU, etc., or any other type of hardware circuitry. Each core 1102 includes control unit circuitry 1114, arithmetic and logic (AL) circuitry (sometimes referred to as an ALU) 1116, a plurality of registers 1118, the L1 cache 1120, and a second example bus 1122. Other structures may be present. For example, each core 1102 may include vector unit circuitry, single instruction multiple data (SIMD) unit circuitry, load/store unit (LSU) circuitry, branch/jump unit circuitry, floating-point unit (FPU) circuitry, etc. The control unit circuitry 1114 includes semiconductor-based circuits structured to control (e.g., coordinate) data movement within the corresponding core 1102. The AL circuitry 1116 includes semiconductor-based circuits structured to perform one or more mathematic and/or logic operations on the data within the corresponding core 1102. The AL circuitry 1116 of some examples performs integer based operations. In other examples, the AL circuitry 1116 also performs floating point operations. In yet other examples, the AL circuitry 1116 may include first AL circuitry that performs integer based operations and second AL circuitry that performs floating point operations. In some examples, the AL circuitry 1116 may be referred to as an Arithmetic Logic Unit (ALU). The registers 1118 are semiconductor-based structures to store data and/or instructions such as results of one or more of the operations performed by the AL circuitry 1116 of the corresponding core 1102. For example, the registers 1118 may include vector register(s), SIMD register(s), general purpose register(s), flag register(s), segment register(s), machine specific register(s), instruction pointer register(s), control register(s), debug register(s), memory management register(s), machine check register(s), etc. The registers 1118 may be arranged in a bank as shown in
Each core 1102 and/or, more generally, the microprocessor 1100 may include additional and/or alternate structures to those shown and described above. For example, one or more clock circuits, one or more power supplies, one or more power gates, one or more cache home agents (CHAs), one or more converged/common mesh stops (CMSs), one or more shifters (e.g., barrel shifter(s)) and/or other circuitry may be present. The microprocessor 1100 is a semiconductor device fabricated to include many transistors interconnected to implement the structures described above in one or more integrated circuits (ICs) contained in one or more packages. The processor circuitry may include and/or cooperate with one or more accelerators. In some examples, accelerators are implemented by logic circuitry to perform certain tasks more quickly and/or efficiently than can be done by a general purpose processor. Examples of accelerators include ASICs and FPGAs such as those discussed herein. A GPU or other programmable device can also be an accelerator. Accelerators may be on-board the processor circuitry, in the same chip package as the processor circuitry and/or in one or more separate packages from the processor circuitry.
More specifically, in contrast to the microprocessor 1100 of
In the example of
The interconnections 1210 of the illustrated example are conductive pathways, traces, vias, or the like that may include electrically controllable switches (e.g., transistors) whose state can be changed by programming (e.g., using an HDL instruction language) to activate or deactivate one or more connections between one or more of the logic gate circuitry 1208 to program desired logic circuits.
The storage circuitry 1212 of the illustrated example is structured to store result(s) of the one or more of the operations performed by corresponding logic gates. The storage circuitry 1212 may be implemented by registers or the like. In the illustrated example, the storage circuitry 1212 is distributed amongst the logic gate circuitry 1208 to facilitate access and increase execution speed.
The example FPGA circuitry 1200 of
Although
In some examples, the processor circuitry 1012 of
A block diagram illustrating an example software distribution platform 1305 to distribute software such as the example machine readable instructions 1032 of
Example methods, apparatus, systems, and articles of manufacture to enable in-field thermal calibration of computing devices are disclosed herein. Further examples and combinations thereof include the following:
Example 1 includes an apparatus comprising instructions, memory in the apparatus, and processor circuitry to execute the instructions to determine that a system on chip (SOC) package is deployed, the SOC package deployed with a default first thermal model, in response to the determination that the SOC package is deployed, monitor at least one temperature of the SOC package from a sensor and power usage of the SOC package, calibrate a second thermal model based on the at least one temperature and the power usage, and publish the calibrated second thermal model for control of the SOC package.
Example 2 includes the apparatus as defined in example 1, wherein the first thermal model and the second thermal model are unitary.
Example 3 includes the apparatus as defined in example 2, wherein the processor circuitry is to execute the instructions to replace data points associated with the first thermal model with data points of the monitoring of the at least one temperature and the power usage to calibrate the second thermal model.
Example 4 includes the apparatus as defined in example 1, wherein the processor circuitry is to execute the instructions to calibrate a third thermal model after a time duration of the SOC package being deployed.
Example 5 includes the apparatus as defined in example 1, wherein the processor circuitry is to execute the instructions to determine that a requisite number of data points pertaining to the at least one temperature and the power usage have been obtained while monitoring thereof, and in response to determining that the requisite number of the data points have been obtained, initiate the calibration of the second thermal model.
Example 6 includes the apparatus as defined in example 1, wherein the sensor includes an on-die temperature sensor of the SOC package.
Example 7 includes the apparatus as defined in example 1, wherein the processor circuitry is to cause at least one core of the SOC package to execute a pre-defined workload to calibrate the second thermal model.
Example 8 includes the apparatus as defined in example 1, wherein the processor circuitry is to execute the instructions to cause the SOC package to operate based on the first thermal model until the second thermal model is calibrated.
Example 9 includes the apparatus as defined in example 1, wherein the processor circuitry is to execute the instructions to calculate the power usage based on current drawn by the SOC package along with a voltage associated with the SOC package.
Example 10 includes the apparatus as defined in example 1, wherein the processor circuitry is to execute the instructions to calibrate the second thermal model while the SOC package is operated with routine workloads associated with deployment thereof.
Example 11 includes a non-transitory computer readable medium comprising instructions which, when executed, cause at least one processor to determine that a system on chip (SOC) package is deployed, the SOC package deployed with a default first thermal model, in response to the determination that the SOC package is deployed, monitor at least one temperature of the SOC package from a sensor and power usage of the SOC package, calibrate a second thermal model based on the at least one temperature and the power usage, and publish the calibrated second thermal model for control of the SOC package.
Example 12 includes the non-transitory computer readable medium as defined in example 11, wherein the first thermal model and the second thermal model are unitary.
Example 13 includes the non-transitory computer readable medium as defined in example 12, wherein the instructions cause the at least one processor to replace data points associated with the first thermal model with data points from the monitoring of the at least one temperature and the power usage to calibrate the second thermal model.
Example 14 includes the non-transitory computer readable medium as defined in example 11, wherein the instructions cause the at least one processor to execute the instructions to calibrate a third thermal model after a time duration of the SOC package being deployed.
Example 15 includes the non-transitory computer readable medium as defined in example 11, wherein the instructions cause the at least one processor to determine that a requisite number of data points pertaining to the at least one temperature and the power usage have been obtained while monitoring thereof, and in response to determining that the requisite number of the data points have been obtained, initiate the calibration of the second thermal model.
Example 16 includes the non-transitory computer readable medium as defined in example 11, wherein the instructions cause the at least one processor to direct at least one core of the SOC package to execute a pre-defined workload to calibrate the second thermal model.
Example 17 includes the non-transitory computer readable medium as defined in example 11, wherein the instructions cause the at least one processor to direct the SOC package to operate based on the first thermal model until the second thermal model is calibrated.
Example 18 includes the non-transitory computer readable medium as defined in example 11, wherein the instructions cause the at least one processor to fit an equation to the second thermal model.
Example 19 includes the non-transitory computer readable medium as defined in example 11, wherein the instructions cause the at least one processor to calculate the power usage based on current drawn by the SOC package along with a voltage associated with the SOC package.
Example 20 includes the non-transitory computer readable medium as defined in example 11, wherein the instructions cause the at least one processor to calibrate the second thermal model while the SOC package is operated with routine workloads associated with deployment thereof.
Example 21 includes a method comprising determining, by executing instructions with at least one processor, that a system on chip (SOC) package is deployed, the SOC package deployed with a default first thermal model, in response to the determining that the SOC package is deployed, monitoring, by executing instructions with the at least one processor, at least one temperature of the SOC package from a sensor and power usage of the SOC package, calibrating, by executing instructions with the at least one processor, a second thermal model based on the at least one temperature and the power usage, and publishing, by executing instructions with the at least one processor, the calibrated second thermal model for control of the SOC package.
Example 22 includes the method as defined in example 21, wherein the first thermal model and the second thermal model are unitary.
Example 23 includes the method as defined in example 22, further including replacing, by executing instructions with the at least one processor, data points associated with the first thermal model with data points from the monitoring of the at least one temperature and the power usage to calibrate the second thermal model.
Example 24 includes the method as defined in example 21, further including calibrating, by executing instructions with the at least one processor, a third thermal model after a time duration of the SOC package being deployed.
Example 25 includes the method as defined in example 21, further including determining, by executing instructions with the at least one processor, that a requisite number of data points pertaining to the at least one temperature and the power usage have been obtained while monitoring thereof, and in response to the determining that the requisite number of data points have been obtained, initiating, by executing instructions with the at least one processor, the calibration of the second thermal model.
Example 26 includes the method as defined in example 21, further including directing, by executing instructions with the at least one processor, at least one core of the SOC package to execute an identified workload to calibrate the second thermal model.
Example 27 includes the method as defined in example 21, further including directing, by executing instructions with the at least one processor, the SOC package to operate based on the first thermal model until the second thermal model is calibrated.
Example 28 includes the method as defined in example 21, further including fitting, by executing instructions with the at least one processor, an equation to the second thermal model.
Example 29 includes the method as defined in example 21, further including calculating, by executing instructions with the at least one processor, the power usage based on current drawn by the SOC package along with a voltage associated with the SOC package.
From the foregoing, it will be appreciated that example systems, methods, apparatus, and articles of manufacture have been disclosed that enable accurate control of computing devices and enable utilization of a full capability of the computing device. Examples disclosed herein personalize a thermal guard band used by SOCs to prevent violation of design limits. According to examples disclosed herein, a thermal guard band can be optimized according to individual characteristics of a system to utilize a greater potential of the system. In some initial testing, a 1-2% performance improvement was observed in contrast to a known one-size-fits-all approach. Examples disclosed herein have been demonstrated to proactively reduce frequency when a measured temperature rose to within 5° C. of a limit. Examples disclosed herein recover a significant portion of this performance loss by enabling prediction of when the temperature will overshoot tailoring specific temperature prediction to the actual system.
Disclosed systems, methods, apparatus, and articles of manufacture improve the efficiency of using a computing device by enabling increased utilization thereof and, thus, enabling quicker completion of computing tasks based on available thermal overhead. Disclosed systems, methods, apparatus, and articles of manufacture are accordingly directed to one or more improvement(s) in the operation of a machine such as a computer or other electronic and/or mechanical device.
The following claims are hereby incorporated into this Detailed Description by this reference. Although certain example systems, methods, apparatus, and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all systems, methods, apparatus, and articles of manufacture fairly falling within the scope of the claims of this patent.