The temperature of an electronic device is determined by retained heat. Retained heat is the difference between generated heat and dissipated heat. The thermal behavior of an electronic device is strongly related to the device's platform type. However, other factors also contribute to an electronic device's thermal behavior. These factors include usage of the electronic device and external factors such as the surface supporting the electronic device, ambient temperature, or humidity, among others.
Certain examples are described in the following detailed description and in reference to the drawings, in which:
Techniques for monitoring the thermal health of an electronic device are discussed herein. For example, a system for monitoring the thermal health may predict an expected temperature of the electronic device. To perform this function, a difference between the actual temperature of the electronic device and the expected temperature may be computed. A z-score may be computed for the difference between the actual temperature and the expected temperature, and mapped to a thermal health grade for the electronic device.
In certain situations, the electronic device may have inadequate heat dissipation. These situations may result in uncomfortable handling or a shortening of the lifespan of the electronic device.
The techniques described herein may use electronic device data and machine learning techniques to train a model to evaluate the thermal health of a device. In particular, a trained model results in a thermal health grade for an electronic device based on the thermal properties of the device. The grade given the electronic device may become worse as the heat dissipation becomes more inadequate. The techniques discussed herein may be used to detect when an electronic device may be serviced. As such, the techniques discussed herein may extend the lifespan of the electronic device.
The data collected during data collection 102 may be of two types, descriptive features and instrument features. The descriptive features may include such things as device platform, form factor, cooling system, CPU model, and a number of CPUs in the device. These descriptive features may be used to group the data of devices with similar physical characteristics. Knowing the device platform or product line may be useful for classifying an electronic device into an appropriate group. Otherwise, knowing the form factor, cooling system, and CPU model may be enough to group an electronic device.
The instrument features may include the data received from sensors that detect the temperature of an electronic device and other parameters that influence the thermal behavior of the device over time. These other parameters may include CPU usage, fan speed, battery usage, battery temperature, device age, and GPU usage, among others. For example, CPU usage and GPU usage may be expressed as a percentage of the time the CPU or GPU is in use, the fan speed may be provided on a scale from 0 to 100, and the battery usage may be true or false depending on whether the battery is in use or not.
Different device sensors may be offered by different manufacturers. Better thermal health grading may result if more sensors are available to detect the different parameters affecting the thermal health of an electronic device. For example, a more accurate thermal health grade may be obtained if an electronic device has sensors for CPU usage, fan speed, battery usage, and device age than if the electronic device only has sensors for CPU usage and device age. Furthermore, more frequent sampling may result in improved confidence in the thermal health grade for an electronic device. For example, samples collected hourly may provide a more accurate thermal health grade than samples collected daily.
In model training 104, machine learning 110 may result in trained models 112. Machine learning methods may include decision tree learning, association rule learning, neural networks, deep learning, inductive logic programming, support vector machines, clustering, Bayesian networks, reinforcement learning, representation learning, similarity and metric learning, sparse dictionary learning, rule-based machine learning, and learning classifier systems. For example, decision tree learning uses a decision tree as a predictive model which maps observations about an item, represented by the branches, to conclusions about the item's target value, represented by the leaves.
Decision trees where the target variable can take on continuous values, such as the temperature of an electronic device, are called regression trees. Decision tree learning may result in a random forest model. A random forest model may be linear or non-linear. Other types of models may be obtained using other machine learning methods. The other types of models may be static, dynamic, explicit, implicit, discrete, continuous, deterministic, probabilistic, deductive, inductive, or floating.
Using machine learning 110, a model may be trained to predict the temperature of an electronic device based on CPU usage, fan speed, and battery usage. For example, a random forest model may have a multitude of predictive trees constructed at training time and output the mean prediction of the individual regression trees. The mean prediction may be the temperature of an electronic device.
Like some decision tree models, the random forest model can accept non-numeric data types, such as Boolean variables, such as battery usage, and categorical variables, including, for example, form factor. However, the random forest model may generalize to unforeseen situations. In addition, the random forest model may learn more parameters and accommodate a more complex target feature. Furthermore, the random forest model has the flexibility to rank the parameters by impact on the target feature. For example, the random tree model may rank fan speed, battery usage, and CPU usage by impact on the temperature of an electronic device.
Returning to
The root mean square error (RMSE) may be computed for the trained models 112 using a cross-validation train-test partitioning. The RMSE is the sample standard deviation of the differences between the actual temperatures and the temperatures predicted by the trained model 112 for a certain device platform or product line. The technique of computing RMSE using cross-validation train-test partitioning provides an estimate of model prediction performance. The technique involves partitioning a sample of data into complementary or non-overlapping subsets, computing the RMSE for one subset called the training set, and validating the RMSE on the other subset called the testing set. A maximum acceptable RMSE may be used to decide if a trained model 112 is accurate enough to be used in grading 106.
To be reliable, a grading model may be trained on a minimum number of different device platforms or product lines. Also, a reliable grading model may be trained on a minimum number of devices for each type of device platform or product line. For example, a grading model may be reliable if trained using at least 15 days of daily data collections per device and at least 30 different types of device platforms or product lines.
The trained model 112 may represent the thermal behavior of a device platform or product line. The trained model 112 may generalize to new device platforms or product lines. However, a new device platform or product line may suffer from the cold start problem, i.e., a lack of information about the new device platform or product line. Models may be applied hierarchically following the device product hierarchy to avoid the cold start problem. For example, there may be models for platforms X, Y, and Z. Platform X may not enough data records to train a model. There may be a second model trained on all platforms of the same form factor, for example, platforms Y and Z. The second model may generalize to platform X. If the second model does not generalize, there may be a model for the platform family that generalizes to platform X. Movement up the hierarchy may continue until a model that generalizes to platform X is found.
The trained model 112 may predict the average temperature given all possible device conditions expressed as instrument features. By calculating the difference between the actual temperature and the predicted temperature, it may be possible to grade the thermal health of an electronic device. However, if a single temperature difference is calculated, the thermal health grade may be inaccurate because of data noise and changes in device usage. To correct for these inaccuracies, the differences between the actual temperatures from the last N data records and the model predictions may be calculated and averaged. From the average of the differences, a z-score may be calculated and mapped to a thermal device grade.
The trained models 112 may have low RMSEs, so it may be assumed that the differences between the actual temperatures and the expected temperatures may follow a Gaussian distribution such as that depicted in
The z-score can be calculated for Gaussian distributions. A z-score is the number of standard deviations a data point is above or below the average value of what is being measured. For the techniques described herein, a z-score is the number of standard deviations that the average difference between actual and predicted temperatures for N data records is above or below the average value for the temperature difference for all electronic devices in a data repository of a certain platform type or product line. A z-score is calculated using Eqn. 1.
z-score=(x−μ)/σ Eqn. 1
In Eqn. 1, the term x represents the average difference between the actual and predicted temperatures for N data records. The term μ represents the distribution average, the average of the differences between the actual and expected temperatures, for all the devices in the data repository that share the same platform or product line. The term σ represents the standard deviation for the distribution.
As an example, a z-score of 3.0 for the average difference between the actual and predicted temperatures for the last N data records is 3.0 standard deviations to the right of the distribution average. A z-score of −2.2 for the average difference between the actual and predicted temperatures for the last N data records is 2.2 standard deviations to the left of the distribution average.
After computing the z-score, the thermal health grade of an electronic device may be determined by mapping the z-score to a value based on a function or a table like the one shown in
The thermal health grade for the electronic device may be on a scale from 0 to 100 as shown in
The system 500 may include a memory device 504 that stores instructions that are executable by the CPU 502. The CPU 502 may be coupled to the memory device 504 by a bus 506. The memory device 504 may include random access memory (e.g., SRAM, DRAM, zero capacitor RAM, SONOS, eDRAM, EDO RAM, DDR RAM, RRAM, PRAM, etc.), read only memory (e.g., Mask ROM, PROM, EPROM, EEPROM, etc.), flash memory, or any other suitable memory system. The memory device 504 can be used to store data and computer-readable instructions that, when executed by the processor 502, direct the processor 502 to perform various operations in accordance with embodiments described herein.
The system 500 may also include a storage device 508. The storage device 508 may be a physical memory device such as a hard drive, an optical drive, a flash drive, an array of drives, or any combinations thereof. The storage device 508 may store data as well as programming code such as device drivers, software applications, operating systems, and the like. The programming code stored by the storage device 508 may be executed by the CPU 502.
The storage device 508 may include a data sensor 510, a model trainer 512, an expected temperature predictor 514, and a computation manager 516. The data sensor 510 may accomplish the tasks associated with data collection 102 in
The data sensor 510 may detect the temperature of an electronic device and other parameters that influence the device's thermal behavior over time. The data may be collected and stored in data records. A data record may include temperature, CPU usage, fan speed, and battery use of the electronic device. The data records may be stored in a data repository 518.
The model trainer 512 may train a model using the data records from the data repository 518. Using machine learning, a model may be trained to predict the temperature of an electronic device based on CPU usage, fan speed, and battery usage. There are a number of machine learning techniques that may be used to train a variety of models. For example, a random forest model may be trained by constructing a multitude of decision trees. A model may be trained for each type of device platform or product line.
The expected temperature predictor 514 may use the trained model for the appropriate device platform or product line to predict the expected temperature of an electronic device. The trained model may use the CPU usage, fan speed, and battery usage to predict the expected temperature. For a random forest model, the expected temperature is the mean prediction of the individual trees constructed during the machine learning phase.
The computation manager 516 may determine the thermal health grade for an electronic device. To accomplish this, the computation manager 516 may include a temperature difference calculator 520, a z-score calculator 522, and a z-score mapper 524. The temperature difference calculator 520 may calculate the difference between the actual temperatures of the last N data records and the model predictions. The average of the N differences between the actual and expected temperatures may be calculated by the temperature difference calculator 520.
The z-score calculator 522 may calculate the z-score for the average temperature difference calculated by the temperature difference calculator 520. Because the temperature differences for a particular device platform or product line follow a Gaussian distribution, the z-score may be the number of standard deviations that the average temperature difference is above or below the average value for the distribution.
The z-score mapper 524 may map the z-score to a thermal health grade for the electronic device. The mapping of the z-score to a value may be accomplished using a function or a table similar to the one in
The system 500 may be used to monitor the thermal health grade of an electronic device. The thermal health grade may decrease as the thermal health of the electronic device degrades. Once the thermal health grade has fallen to a certain point, maintenance may be necessary to prevent further degradation of the thermal health of the electronic device and possible irreparable damage. Furthermore, the system 500 may be used to determine if the intervention was effective at improving the thermal health of the electronic device.
The system 500 may also include a display 526. The display 526 may be a touchscreen built into the device. For example, the touchscreen may include a touch entry system. Alternatively, the display 526 may be an interface that couples to an external display. In this example, a human machine interface may couple to input devices, such as mice, keyboards, and the like. The display 526 may show the thermal health grade of an electronic device. The display 526 may also show any of the data used to calculate the thermal health grade, e.g., from data records to z-scores. The display 526 may further display a recommendation for maintenance if the thermal health grade is at or below a predetermined threshold.
The system 500 may include an input/output (I/O) device interface 528 to connect the system 500 to one or more I/O devices 530. For example, the I/O devices 530 may include a scanner, a keyboard, and a pointing device such as a mouse, a touchpad, or touchscreen, among others. The I/O devices 530 may be built-in components of the system 500, or may be devices that are externally connected to the system 500.
The system 500 may further include a network interface controller (NIC) 532 to provide a wired communication to the cloud 534. The cloud 534 may be in communication with the data repository 518. The system 500 may communicate with the data repository 518 via the NIC 532 and the cloud 534.
The block diagram of
At block 704, a model may be trained using the data collected at block 702. Using machine learning, a model may be trained to predict the temperature of an electronic device based on CPU usage, fan speed, and battery usage. In particular, the trained model may be a random forest model. A model may be trained for each type of device platform or product line.
At block 706, the trained model may be used to predict the expected temperature of an electronic device. Inputs to the trained model may include CPU usage, fan speed, and battery usage. From these inputs, the expected temperature is predicted. The expected temperature may be predicted N times using the last N data records for a particular type of device platform or product line.
At block 708, the difference between the actual temperature and expected temperature may be computed. Each data record may include the temperature of the electronic device in addition to CPU usage, fan speed, and battery usage. The calculated difference is between the actual temperature in a data record and the expected temperature predicted using CPU usage, fan speed, and battery usage contained in the same data record. The difference between the actual temperature and expected temperature may be computed N times using the last N data records for a particular type of device platform or product line. The N differences between the actual and expected temperatures may be averaged.
At block 710, a z-score may be computed for the difference between the actual temperature and expected temperature of the electronic device. The z-score may be calculated because the temperature differences for a given type of device platform or product line follow a Gaussian distribution much like the one shown in
At block 712, the z-score may be mapped to a thermal health grade. The mapping of the z-score to a value may be accomplished using a function or a table similar to the one in
The process flow diagram of
As described herein, the non-transitory, computer-readable medium 900 may include code 906 to direct the processor 902 to predict the expected temperature using a model. Code 908 may be included to direct the processor 902 to compute the difference between the actual and expected temperature. Code 910 may be included to direct the processor 902 to compute the z-score for the difference between the actual temperature and the expected temperature. Code 912 may be included to direct the processor 902 to map the z-score to a thermal health grade for the electronic device.
The block diagram of
Using the table 400 in
The techniques described herein may be applied to many types of electronic devices, independent of model, platform, or manufacturer. Furthermore, comparisons between models, platforms, and manufacturers may be made using the techniques described herein. The data-driven techniques have a learning component that may result in thermal models that are up-to-date. Storing of data in a large data repository may make it possible to execute machine learning in a scalable way. Scalability involves the constant addition of new data that is used to update the trained models. Trained models may be reused, thereby avoiding the need for data reprocessing. Training of the models may occur without any human intervention.
The techniques described herein may provide early detection of abnormal thermal behavior of an electronic device. A maintenance alert may be triggered, so that engineers can investigate and determine the root cause of the abnormal thermal behavior. Moreover, the techniques described herein may be used for prototyping a new electronic device. Engineers may use the techniques to train a model for the new device and compare the model to models for other electronic devices to facilitate the identification of bottlenecks in the heat dissipation of the new device.
A model may not have to be trained immediately for a new electronic device. Further, a model may be trained for a particular type of electronic device and may generalize to a new version of the electronic device. For example, a model may be trained with data from a workstation. When a new version of the workstation is released, the model may generalize to the new version without having to be retrained. However, generalization may be limited after a certain point and the model may eventually have to be retrained for the new version of the electronic device.
While the present techniques may be susceptible to various modifications and alternative forms, the examples discussed above have been shown only by way of example. It is to be understood that the techniques are not intended to be limited to the particular examples disclosed herein. Indeed, the present techniques include all alternatives, modifications, and equivalents falling within the scope of the present techniques.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2017/028114 | 4/18/2017 | WO | 00 |