Current general purpose processor designs (e.g., a main application processor) devote a large, sometimes the largest, fractions of on-chip transistors to cache memory for temporarily storing processor information for quick access. Static general purpose processor on-chip transistor configurations for cache memory lead to largely underutilized cache memory resources, and burn a significant fraction of a system power budget to manage the cache resources even when not in use or when used inefficiently. These static cache memory configurations also result inconsistent performance for different applications because of their rigid cache management policies. One cache memory configuration implementation does not provide consistent or desirable performance for all applications under varying circumstances. This is because, in part, different applications exhibit different cache memory access patterns and the cache memory configuration may not be efficiently or desirably setup for the different cache memory access patterns. Inefficient or undesired use of the cache results in unfavorable performance, power consumption, and thermal generation.
The methods and apparatuses of various aspects provide circuits and methods for generating a cache memory configuration including applying machine learning to context data, determining a first cache memory configuration relating to the context data for a cache memory of a computing device, and predicting execution of an application on the computing device.
The methods and apparatuses of various aspects provide circuits and methods for implementing configuring a cache memory of a computing device including classifying a plurality of cache memory configurations based on at least a hardware data threshold of the computing device and first hardware data of the computing device in which the plurality of cache memory configurations are related to a predicted application execution, selecting a first cache memory configuration from the plurality of cache memory configurations in response to the classification of the plurality of cache memory configurations indicating the first cache memory configuration as classified for the first hardware data of the computing device, and configuring the cache memory at runtime based on the first cache memory configuration.
In an aspect, applying machine learning to context data may include applying machine learning to the context data and hardware data of the computing device related to the context data, and determining a first cache memory configuration relating to the context data for a cache memory of a computing device may include determining the first cache memory configuration relating to the context data and hardware data thresholds for the cache memory of the computing device.
An aspect method may further include correlating the predicted application and the first cache memory configuration.
An aspect method may further include validating the predicted application and the first cache memory configuration, storing the predicted application and the first cache memory configuration in response to the predicted application and the first cache memory configuration being valid, and altering the machine learning with an error value in response to the predicted application and the first cache memory configuration being invalid.
An aspect method may further include classifying a plurality of cache memory configurations based on at least a hardware data threshold of the computing device and first hardware data of the computing device in which the plurality of cache memory configurations are related to a predicted application execution, selecting the first cache memory configuration from the plurality of cache memory configurations in response to the classification of the plurality of cache memory configurations indicating the first cache memory configuration as classified for the first hardware data of the computing device, and configuring the cache memory at runtime based on the first cache memory configuration.
An aspect method may further include receiving a plurality of cache memory parameters, in which each of the plurality of cache memory parameters are associated with context data, at least one hardware data threshold of the computing device, the predicted application execution, and at least one cache memory configuration.
An aspect method may further include receiving second hardware data of the computing device after configuring the cache memory at runtime based on the selected first cache memory configuration, classifying the plurality of cache memory configurations based on at least one hardware data threshold of the computing device and the second hardware data of the computing device, selecting a second cache memory configuration from the plurality of cache memory configurations in response to the classification of the plurality of cache memory configurations indicating the second cache memory configuration as classified for the second hardware data of the computing device, and configuring the cache memory at runtime based on the second cache memory configuration.
In an aspect, the first hardware data and the second hardware data each may include at least one of cache memory related data, data related to a first processor in which the first processor is associated with a dedicated cache memory, data related to a second processor, and data related to a third processor in which the second processor and the third processor are associated with a shared cache memory.
An aspect includes a computing device having a cache memory, and a processor coupled to the cache memory and configured with processor-executable instructions to perform operations of one or more aspect methods described above. An aspect includes a computing device having means for performing functions of one or more of the aspect methods described above. An aspect includes a non-transitory processor-readable medium having stored thereon processor-executable software instructions to cause a processor to perform operations of one or more of the aspect methods described above.
The accompanying drawings, which are incorporated herein and constitute part of this specification, illustrate example aspects of the claims, and together with the general description given above and the detailed description given below, serve to explain the features of the claims.
The various aspects will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made to particular examples and implementations are for illustrative purposes, and are not intended to limit the scope of the claims.
The terms “computing device” and “mobile device” are used interchangeably herein to refer to any one or all of cellular telephones, smartphones, personal or mobile multi-media players, personal data assistants (PDA's), laptop computers, tablet computers, smartbooks, ultrabooks, palm-top computers, wireless electronic mail receivers, multimedia Internet enabled cellular telephones, wireless gaming controllers, and similar personal electronic devices that include a memory, and a programmable processor. The term “computing device” may also refer to stationary devices, such as desktop computers and servers, including individual server blades and server systems. While the various aspects are particularly useful for mobile computing devices, such as smartphones, which have limited energy resources, the aspects are generally useful in any electronic device that implements a plurality of memory devices and a limited energy budget where reducing the power consumption of the memory devices can extend the battery-operating time of the mobile computing device. The various aspects are also useful in any electronic device having a continuous power supply from a power source, such as an electrical utility grid or a power generator, such as a fuel cell or other alternative or renewable energy source, where performance of the electronic device may be increased while potentially increasing power consumption.
The term “system-on-chip” (SoC) is used herein to refer to a set of interconnected electronic circuits typically, but not exclusively, including a hardware core, a memory, and a communication interface. A hardware core may include a variety of different types of processors, such as a general purpose processor, a central processing unit (CPU), a digital signal processor (DSP), a graphics processing unit (GPU), an accelerated processing unit (APU), an auxiliary processor, a single-core processor, and a multi-core processor. A hardware core may further embody other hardware and hardware combinations, such as a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), other programmable logic device, discrete gate logic, transistor logic, performance monitoring hardware, watchdog hardware, and time references. Integrated circuits may be configured such that the components of the integrated circuit reside on a single piece of semiconductor material, such as silicon.
The methods, systems, and devices described herein improve computing device performance by implementing a configurable cache memory, and by matching the cache memory structure and management to applications' needs. Configuring a cache memory structure and management, is accomplished though using computing device cache usage predictions developed by models from context information for improving performance, energy efficiency, and/or thermal generation when executing different applications.
Context information used in making computing device cache usage predictions may be provided from multiple sources. The context information may include data related to an application executed by the computing device. The application and application data may be analyzed offline to provide a profile of the application and its usage. The application may be analyzed offline using multiple techniques, including static, dynamic, or a hybrid of static and dynamic analysis of the application's code and its execution on the computing device.
Computing device application execution information used in making computing device cache usage predictions may include the usage information during the execution of the application, as well as the application's needs, such as data locality and which features of the computing device execution the application uses. The context information may be gathered from a specific computing device and/or from a population of similar computing devices.
Other context information used in making computing device cache usage predictions may include usage and state information from the computing device. The computing device usage and state information may include a variety of parameters including general historic usage information (e.g., types of usage at various locations and times), hardware counter information that indicates the frequency with which various hardware elements are used or accessed, cache memory configuration and usage data (e.g., working set size and number of reads, writes, and misses), and computing device state information (e.g., processing frequency, temperature, current leakage, and power availability).
An offline model, which may include a variety and/or combination of learning models and first classification models, uses and correlates the context information to develop user behavior predictions, such as identifying a likely next application for execution, group of applications for execution, or manner of using the computing device. Learning models analyze the context information to determine different sets of context information that correlate with the different user behaviors. First classification models analyze the correlated sets of context information and user behaviors to determine the likelihood of specific user behaviors or application executions for various parameters included in the context information. The resulting user behavior and application predictions may be subjected to model validation to test whether the user behavior and application predictions are accurate. The model validation may result in rejection of the user behavior predictions, in response to which the learning model and first classification model may use data from the model validation to update the user behavior and application predictions. The model validation may result in approval of the user behavior and application predictions, in response to which user behavior and application predictions may be made available for use by the computing device. For example, a database relating user behavior and application predictions with context information may be accessible by the computing device, and user behavior and application predictions may be provided to the computing device based on parameters aligned with the associated context information.
A second classification model may be implemented to determine cache memory configuration threshold classifications for the user behavior and application predictions at run-time. The second classification model may analyze the user behavior and application predictions, along with historical and current usage information, to determine whether it is appropriate to implement cache memory configurations for the user behavior and application predictions, modify cache memory configurations for the user behavior and application predictions, or ignore the user behavior and application predictions. The second classification model may be provided with multiple user behavior and application predictions and may determine one or more cache memory configuration threshold classifications for implementing cache memory configuration. The second classification model may generate cache memory configuration threshold classifications for configuring various cache memory parameters (e.g., cache memory activation/deactivation, reservation for a particular use, size, level usage settings, associativity, line size, and management policy). The second classification model may determine the various threshold classifications for optimizing the cache memory configuration based on the computing device historical and current usage information. The threshold classifications may be associated with certain operating parameters of the computing device such that the threshold classifications provide cache memory configuration parameters tailored to the operating parameters. The second classification model may determine the cache memory configuration parameters to use for configuring the cache memory based on an analysis of the computing device usage information compared with the cache memory configuration threshold classifications. The computing device usage information, such as the operating parameters of the computing device, may be compared to the threshold classifications, and the threshold classification best suited for the computing device usage information may be selected for implementing the associated cache memory configuration.
The cache memory configuration parameters, which may take the form of a cache memory configuration vector, associated with the determined cache memory configuration threshold classifications may be used by a cache memory configuration engine for configuring an entire cache memory or portions/partitions of a cache memory. The cache memory configuration engine may receive the cache memory configuration parameters to use for configuring the cache memory based on the analysis of the computing device usage information compared with the cache memory configuration threshold classifications. The cache memory configuration engine may modify the configuration of a cache memory by modifying the cache configuration memory parameters (e.g., cache memory activation/deactivation, reservation for a particular use, size, level usage settings, associativity, line size, and management policy) of the cache memory as provided from the second classification model.
The methods, systems, and devices for implementing the configurable cache memory may be implemented on multiple platforms with different elements executed on different computing devices depending on the processing capabilities and power constraints of the computing device having the configurable cache memory. Such computing devices may include servers, mobile devices, and other computing devices with processing capabilities and at least intermittent connections to a network (e.g., the Internet, such that these devices are part of the Internet of Things). In an example, a server may include sufficient processing and power resources to implement the models, model validation, model classification, and cache memory configuration. In another example, a mobile device may have processing and power budgets that make implementing the learning and first classification models and model validation on a server more efficient, and the mobile device may implement the model classification and cache memory configuration. In a further example, the processing and power budgets of an Internet of Things computing device may be such that it is more efficient to implement the learning and first classification models, model validation, and model classification on a server, and the Internet of Things computing device may implement the cache memory configuration. In another example, multiple computing devices may implement cache memory configuration models for implementing an application, such as a cloud server may configure its cache memory for providing a cloud implemented service, and a mobile device may configure its cache memory to interact with the cloud server.
The methods, systems, and devices of the various aspects may be implemented on a variety of different computing device or system architectures. Therefore, references to particular types of processor configurations or system architectures are for example purposes only, and are not intended to limit the scope of the claims. In particular, three different system architectures on which the various aspects may be implemented are illustrated in
The memory 16 of the SoC 12 may be a volatile or non-volatile memory configured for storing data and processor-executable code for access by the processor 14. In an aspect, one or more memories 16 may include volatile memories such as random access memory (RAM) or main memory, cache memory, or hardware registers. These memories 16 may be configured to temporarily hold a limited amount of data and/or processor-executable code instructions that are requested from non-volatile memory, loaded to the memories 16 from non-volatile memory in anticipation of future access based on a variety of factors, and/or intermediary processing data and/or processor-executable code instructions produced by the processor 14 and temporarily stored for future quick access without being stored in non-volatile memory. In an aspect, the memory 16 may be configured to store data for implementing for cache memory configuration parameter generation and self-adaptive cache memory configuration operations (described further with reference to
The computing device 10 and/or SoC 12 may include one or more memories 16 configured for various purposes. In an aspect, one or more memories 16 may be configured to be dedicated to storing the cache memory configuration parameters that dictate the behavior and accessibility of a configured cache memory. In an aspect, one or more memories 16 may be configured to be dedicated to storing the hardware data and the context data for delayed access for the cache memory configuration parameter generation and the self-adaptive cache memory configuration operations. When the memory 16 storing the cache memory configuration parameters and/or the hardware data and the context data is non-volatile, the memory 16 may retain the cache memory configuration parameters and/or the hardware data and the context data even after the power of the computing device 10 has been shut off. When the power is turned back on and the computing device 10 reboots, the cache memory configuration parameters and/or the hardware data and the context data stored in non-volatile memory 16 may be available to the computing device 10.
The communication interface 18, communication component 22, antenna 26 and/or network interface 28, may work in unison to enable the computing device 10 to communicate over a wireless network 30 via a wireless connection 32, and/or a wired network 44 with the remote computing device 50. The wireless network 30 may be implemented using a variety of wireless communication technologies, including, for example, radio frequency spectrum used for wireless communications, to provide the computing device 10 with a connection to the Internet 40 by which it may exchange data with the remote computing device 50. In an aspect, the computing device 10 may transmit the hardware data and the context data to the computing device 50. In an aspect, the computing device 50 may transmit cache memory configuration parameters to the computing device 10.
The storage interface 20 and the storage component 24 may work in unison to allow the computing device 10 to store data on a non-volatile storage medium. The storage component 24 may be configured much like an aspect of the memory 16 in which the storage component 24 may store the cache memory configuration parameters and/or the hardware data and the context data, such that the parameters and data may be accessed by one or more processors 14. The storage component 24, being non-volatile, may retain the cache memory configuration parameters and/or the hardware data and the context data even after the power of the computing device 10 has been shut off. When the power is turned back on and the computing device 10 reboots, the cache memory configuration parameters and/or the hardware data and the context data stored on the storage component 24 may be available to the computing device 10. The storage interface 20 may control access to the storage device 24 and allow the processor 14 to read data from and write data to the storage device 24.
Some or all of the components of the computing device 10 may be differently arranged and/or combined while still serving the necessary functions. Moreover, the computing device 10 may not be limited to one of each of the components, and multiple instances of each component may be included in various configurations of the computing device 10.
The processor cores 200, 201, 202, 203 may be heterogeneous in that, the processor cores 200, 201, 202, 203 of a single processor 14 may be configured for different purposes and/or have different performance characteristics. The heterogeneity of such heterogeneous processor cores may include different instruction set architecture, pipelines, operating frequencies, etc.
In the example illustrated in
In an aspect, the processor cores 200, 201, 202, 203 may have associated dedicated cache memories 204, 206, 208, 210. Like the memory 16 in
In an aspect, the processor cores 200, 201, 202, 203 may have associated shared cache memories 212, 214. The shared cache memories 212, 214 may be configured to perform similar functions to the dedicated cache memory 204, 206, 208, 210. However, the shared cache memories 212, 214 may each be in communication with more than one of the processor cores 200, 201, 202, 203 (e.g., processor core 0 and processor core 1 are paired with shared cache memory 0, and processor core 2 and processor core 3 are paired with shared cache memory 1). Each processor core 200, 201, 202, 203 is shown to be in communication with only one shared cache memory 212, 214, however the number of shared cache memories is not meant to be limiting and may vary for each processor core 200, 201, 202, 203. Similarly, each shared cache memory is shown to be in communication with only two processor cores 200, 201, 202, 203, however the number of processor cores is not meant to be limiting and may vary for each shared cache memory 212, 214. The processor cores 200, 201, 202, 203 in communication with the same shared cache memory 212, 214, may be grouped together in a processor cluster as described further herein.
The dedicated cache memories 204, 206, 208, 210 and shared cache memories 212, 214 are also similar to the same components described in
In an aspect, the processors and processor cores described herein need not be located on the same SoC or processor to share a shared cache memory. The processors and processor cores may be distributed across various components while maintaining a connection to the same shared cache memory as one or more other processors or processor cores.
The system cache 402 may be a shared memory device in the SoC 12 used to replace or supplement cache memories that may be associated with the various processors and/or subsystems. The system cache 402 may centralize the cache memory resources of the SoC 12 so that the various processors and subsystems may access the system cache 402 to read and write program commands and data designated for repeated and/or quick access. The system cache 402 may store data from the various processors and subsystems, and also from other memory devices of the computing device, such as main memory, the RAM 428, and the storage device (e.g., a hard disk drive). In an aspect, the system cache 402 may be backed up by such memory and storage devices in case a cache miss occurs because an item requested from the system cache 402 cannot be located. In an aspect, the system cache 402 may be used as scratchpad memory for the various processors and subsystems. The system cache 402 may be smaller in storage space and physical size than a combination of the local cache memories of an SoC of similar architecture that does not employ a system cache 402. However, management of the system cache 402 as described further herein may allow for greater energy conservation and equal or better performance speed of the SoC 12 despite of the system cache's smaller storage space and physical size, and may allow for use of a simple software call flow.
The system cache controller 404 may manage access to and maintenance of the system cache 402 by the various processors and subsystems. Part of the access management of the system cache 402 may include managing the partitions the system cache memory space. The system cache memory space may be partitioned in a variety of manners, including, but not limited to, by cache words, cache lines, cache pages, cache ways, cache sets, cache banks, a partition indication field in a cache tag, or a combination of these parameters. Partitioning the system cache memory space may result in cache memory partitions of various sizes and locations in the system cache memory space. The size, location, and other aspects of the cache memory partitions may be dictated by the cache memory configuration parameters (discussed further with reference to
The CPU clusters 406 may include groupings of several general purpose processors and/or general purpose processor cores. The CPU clusters 406 may access and maintain the system cache 402 via the system cache controller 404. Communications between the CPU clusters 406 and the system cache controller 404 may be converted by a protocol converter 408 from a standard or proprietary protocol of one of the CPU clusters 406 and the system cache controller 404 to a protocol suitable for the other in order to achieve interoperability between them. The CPU clusters 406 may send system cache access requests and cache maintenance and status commands specifying a particular cache memory partition to the system cache controller 404. In return, the system cache controller 404 may allow or deny access to the specified cache memory partition, return the information stored in the specified cache memory partition to the CPU clusters 406, and implement the cache maintenance and status commands.
Similar to the CPU clusters 406, specialized processors, like the GPU 410, the modem DSP 412, and the application DSP 414, may access and maintain the system cache 402 via the system cache controller 404. Communications between the specialized processors 410, 412, 414, and the system cache controller 404 may be managed by dedicated, individual memory interfaces 416. In an aspect memory interfaces 416 may manage communications between multiple similar or disparate specialized processors 410, 412, 414, and the system cache controller 404.
Various subsystems, like the camera subsystem 418, the video subsystem 420, and the display subsystem 422, may similarly access and maintain the system cache 402 via the system cache controller 404 and memory interfaces 416. The NoC 424 may manage the communication traffic between the subsystems 418, 420, 422, and the system hub 400 as well as other components of the SoC 12.
The system cache controller 404 may also manage accesses to the RAM 428 by the various processors and subsystems of the SoC 12. While the various processors and subsystems may make direct access requests to the RAM 428 via the memory controller 426, in certain instances system cache access requests may be directed to the RAM 428. In an aspect, system cache access requests may result in cache misses when the information requested from a specified component cache is not found in the specified component cache. As a result, the system cache controller 404 may direct the system cache access requests to the RAM 428 to retrieve the requested information not found in the component cache. In an aspect, the request for the information directed to the RAM 428 may be directed first to the memory controller 426 that may control access to the RAM 428. The request for the information directed to the RAM 428 may be sent by the system cache controller 404, and the resulting information may be returned to the system cache controller 404 to be written to the cache memory partition and returned from cache memory partition to the components making the system cache access requests. In an aspect, the resulting information may be returned directly, or via the system cache controller 404, to the components making the system cache access requests without being written to the component cache.
In some aspects, portions of the cache memory controller 404 may be implemented and configured in hardware and/or firmware to perform operations of the aspect methods. In some aspects, the cache memory controller 404 may be a programmable controller that is configured by controller-executable instructions to perform operations of the aspect methods. In some aspects, the cache memory controller 404 may be implemented and configured through a combination of firmware and controller-executable instructions to perform operations of the aspect methods.
The descriptions herein of SoC 12 and its various components are only meant to be exemplary and in no way limiting. Several of the components of the SoC 12 may be variably configured, combined, and separated. Several of the components may be included in greater or fewer numbers, and may be located and connected differently within the SoC 12 or separate from the SoC 12. Similarly, numerous other components, such as other memories, processors, subsystems, interfaces, and controllers, may be included in the SoC 12 and in communication with the system cache controller 404 in order to access the system cache 402.
The context data component 502 may be configured to collect context data from various sources, store the context data, and provide the context data to the self-adaptive cache memory configuration system 500 for implementing cache memory configuration parameter generation. Context data may include application and computing device usage user behavior from the computing device 10 and/or other computing device exhibiting similarities with the computing device 10 (e.g., type of computing device, geographic location of the computing device, computing device user characteristics, similar application and computing device usage user behavior, etc.). Context data may also include applications profiles and seed context data for computing devices that do not have at least a requisite amount of application and computing device usage user behavior history. Compositions of context data are discussed further with reference to
The learning component 504 may implement machine learning to determine cache memory configuration vectors associated with various context data. The learning component 504 may receive context data related to certain applications and/or historical computing device usage user behavior from the context data component 502. In an aspect, the context data received from the context data component 502 may be supplemented with specific hardware data related to the context information received from the computing device 10. The data used by the learning component 504 is discussed further with reference to
The first classification component 506 may implement prediction algorithms to determine the next likely application or group of applications for use by the computing device. In an aspect, the first classification component 506 may determine a likely computing device usage use behavior. The first classification component 506 may use the context information, including historical data, provided from the context data component 502 or the learning component 504. The first classification component 506 may also take into account any available hardware of the computing device 10. The likely next application, group of applications, and/or computing device usage use data may be correlated with a cache memory configuration vector.
The validation component 508 may implement validation algorithms to check the accuracy of the results from the learning component 504 and the first classification component 506. The validation component 508 may determine whether the predictions of the first classification component 506 are accurate within certain margins of error, which may vary depending on the platform of the computing device 10, depending on the importance of meeting the goals used for implementing the learning component 504. For example, for a consumer computing device the margin of error may be relatively high to the margin of error allowed in a professional or enterprise environment. Validated predictions and cache memory configuration vectors may be provided to the cache memory configuration parameter storage device 510. Not valid predictions and cache memory configuration vectors may result in providing the learning component 504 and the first classification component 506 with data relevant to why the invalid decision. The learning component 504 and the first classification component 506 may incorporate the validation data into the algorithms of the learning component 504 and the first classification component 506 in order to improve the accuracy of the cache memory configuration vectors or the predicted applications, group of applications, or computing device usage use behavior.
The cache memory configuration parameter storage device 510 may store cache memory configuration models for configuring the cache memory 518 of the computing device 10. Each cache memory configuration model may include relevant context information, memory configuration vectors, and the predicted applications, group of applications, and/or computing device usage use behaviors. This information stored on the cache memory configuration parameter storage device 510 may be stored in a manner in which the information for a cache memory configuration model are relationally linked with their relevant information. The storage of cache memory configuration model information is discussed further with reference to
The hardware data component 512 may include various types of memory, including volatile and non-volatile memory, for keeping track of aspects of the performance of the computing device running particular applications, groups of applications, or during particular computing device usage use behavior. For example, the hardware data component 512 may keep track of data indicating the number and types of accesses to different portions of the cache memory of the computing device 10. Other hardware data tracked by the hardware data component 512 may include cache misses, cache hits, set access frequency, and stack distance/dead set count for various dedicated or shared memory components and processors/processor cores. The hardware data component 512 may use hardware counters of the computing device 10 to track the hardware data. The hardware data component 512 may make the hardware data available to the learning component 504, the first classification component 506, and the second classification component 514.
The second classification component 514 may receive numerous cache memory configuration models from the cache memory configuration parameter storage device 510. The second classification component 514 may be used to select the cache memory configuration model and the cache memory configuration vector for configuring the cache memory for a predicted application or group of applications, and a predicted computing device usage use behavior. The memory configuration models for consideration may be based on the predictions and the hardware data. The second classification component 514 may use known classification techniques to select the appropriate cache memory configuration model and vector, including, for example, classification and regression tree analysis, support vector machine analysis, and small neural networks. The second classification component 514 may determine a cache memory configuration vector to provide to the cache configuration engine 516.
The cache configuration engine 516 may receive the cache memory configuration vector selected by the second classification component 514, and implement the configuration of the cache memory 518.
In various aspects, the self-adaptive cache memory configuration system 500 may include components included in any one or a combination of computing devices 10, 50. Different computing device 10, 50 may have different capabilities and available resources for being able to implement the components of the self-adaptive cache memory configuration system 500. For example, a mobile device may have sufficient processing power and power budget to implement the second classification component 514 and the cache configuration engine 516, and further include the hardware data component 512 and the cache memory 518. In another example, a server may have sufficient processing power and power budget to implement most or all of the components of the self-adaptive cache memory configuration system 500. In a further example, and Internet of Things Device, like a connected household appliance or wearable device, may only have the processing power to implement the cache memory configuration component 516, along with the cache memory 518 and the hardware data component 512. Any of the components not implemented on the computing device 10 that includes the cache memory 518 may be implemented on a remote computing device 50, and provided to the computing device 10.
The learning and/or first classification model 600 may receive the hardware data of the computing device indicating which and how the hardware resources of the computing device are used in various situations. The hardware data may include cache memory usage information, including the portions of the cache memory are accessed, how often they portions of cache memory are accessed, and how often an access of the portions of cache memory fail and succeed, and processor/processor core data (e.g., number of memory access requests, processing frequency, current leakage, temperature, and power draw) for a processor/processor core associated with a dedicated cache memory or multiple processors/processor cores associated with a shared cache memory. The processor/processor core data may be individual to each processor/processor core associated with the shared cache memory or combined based on association with the shared cache memory. Further, the learning and/or first classification model 600 may receive seed information 614, for example, for a computing device that does not have a requisite amount of historical data to be useful in the learning and/or first classification model 600. The seed information 614 may include a conglomeration of historical data from a variety of Other computing devices.
The learning and/or first classification model 600 may input a number of feature vectors 602 that may include selected information from the data 610, the application profiles 612, the hardware data, and the seed information 614. The learning and/or first classification model 600 may apply the feature vectors 602 to classifiers 604, which may implement learning and/or first classification model operations to generate a predicted application(s) or user behavior 606. As discussed above, the learning and/or first classification model operations may include the operations for implementing various learning and/or first classification models. The learning and/or first classification model 600 may output at least the predicted application(s) or user behavior 616, and may further output context information correlated to the predicted application(s) or user behavior 606.
The example in
Continuing with the example in
The groups of cache memory configuration parameters 818-826 may be provided to the second classification component individually, in groups, on demand/by request, or by scheduled transmission, using internal communication, external communication, direct communication, or broadcast. It is contemplated that portions of a group of cache memory configuration parameters 818-826 may be provided, rather than an entire group of cache memory configuration parameters 818-826. As discussed further with reference to
The example cache memory 900 includes five partitions 934-942. Each cache memory partition 934-942 may be dictated with a different cache memory configuration vector and associated for use by a predicted application within a context for the computing device having the cache memory 900. In an example, each cache memory partition 934-942 may correspond with one of the pairs of threshold hardware data values 806, 810, 814 and cache memory configuration vectors 808, 812, 816 of the groups of cache memory configuration parameters 818-826 as described with reference to
In block 1006, the computing device may apply a learning model to the context and/or hardware data. The learning model may be any or a combination of known learning models. In optional block 1008, the computing device may correlate the context data and the hardware data. Correlating the context data and the hardware data may only be achievable if the computing device received hardware data in optional block 1004. Correlating the context data and the hardware data allows the computing device to track relevant data for use to predict usage of the configurable cache memory computing device and determine potential configurations for improving the function of the configurable cache memory computing device. Correlations may be made on a variety of bases, such as correlating context data that matches to the hardware data for the execution of a particular application or particular user behavior on the configurable cache memory computing device. In block 1010, the computing device, executing the learning model, may determine one or more cache memory configuration vectors associated with one or more correlated hardware and context data. The hardware data associated with the cache memory configuration vectors may include acceptable ranges/thresholds of hardware data for the cache memory configuration vectors. Examples of the correlated hardware and context data being associated with the one or more cache memory configuration vectors are described with reference to
In block 1012, the computing device may apply the first classification model to the received and/or correlated hardware and context data. The first classification model may be any or a combination of known learning models. In block 1014, the computing device may determine a predicted application(s) or user behavior. As described with reference to
The correlation of the different data, predictions, and configuration vectors may help avoid potential misalignment of data, predictions, and configurations with similar other data, predictions, or configurations. For example, the execution of the same predicted application may benefit from different cache memory configurations for different context data and/or hardware data. The following example illustrates this point. A context or non-work hours may be indicative of pre-work hours, post-work hours, or weekend hours. A predicted application may be predicted to be executed during one or more of the non-work hour contexts. However, the computing device may be used differently during the pre-work hours, post-work hours, and weekend hours. Thus, a cache memory configuration for the same application may vary for the different contexts. Therefore, correlating the data, predictions, and configuration vectors may help the correct cache memory configuration to be implemented for the application during the correct context.
In determination block 1018, the computing device may validate the predicted application(s) or user behavior and associated cache memory configuration vectors. Validating may include determining whether the predictions and configurations are likely to improve the function of the configurable cache memory computing device. The computing device may use algorithms or simulations to determine whether the performance of the configurable cache memory computing device is likely to improve based on whether the results of the validation fall within an acceptable range of validation values. In response to determining that the predicted application(s) or user behavior and associated cache memory configuration vectors are invalid (i.e. determination block 1018=“No”), the computing device may return to apply the learning model to the context and/or hardware data in block 1006. The computing device may return error values from the validation procedure which the computing device may use to alter the application of the learning model or select a different learning model to improve the results.
In response to determining that the predicted application(s) or user behavior and associated cache memory configuration vectors are valid (i.e. determination block 1018=“Yes”), the computing device may store the cache memory configuration parameters, in block 1020. As described with reference to
In an aspect, the method 1000 may be executed by the computing device offline. Offline execution may be execution on a computing device different from the configurable cache memory computing device, or on the configurable cache memory computing device, but not during runtime of the predicted application(s) or user behavior.
In block 1104, the cache configurable computing device may receive or retrieve its own hardware data. The cache configurable computing device may access the stored hardware data, such as hardware counters or hardware data stored in volatile or non-volatile memory devices. In block 1106, the cache configurable computing device may apply a second classification model to the cache memory configuration parameters and/or hardware data. The second classification model may be any or a combination of known classification models. In block 1108, the cache configurable computing device may select a cache memory configuration vector for use in configuring the cache memory of the cache configurable computing device. In an aspect, the second classification model may use the threshold hardware data values of the cache memory configuration parameters to select which of the cache memory configuration vectors to use. A visual representation of the process of selecting the appropriate cache memory configuration vector may be illustrated by the following example of a set of rules that the machine learning algorithm learns:
if 40%<D<=50% then set cache to <128K, 1-way>;
if 30%<D<=40% then set cache to <128K, 1-way>;
if D>50% then set cache to <256, 2-way>.
In the above example, the hardware threshold values are related to the corollary, D=1−S, of set access frequency, S, of the cache memory. In an aspect, the parameter for determining the hardware threshold values may be set access frequency, S, itself. The set access frequency may count the percentage of accesses to a cache set within a time interval. Depending on the percentage of accesses to a cache set within a time interval, the cache configurable computing device may select a different designated cache memory configuration vector having a cache memory partition size value and a cache memory associativity. This example is not meant to be limiting, and the hardware data, hardware data thresholds, cache memory configuration vectors, and parameters means and methods for comparing the hardware data and hardware data thresholds may vary.
In block 1110, the cache configurable computing device may configure the cache memory based on the selected cache memory configuration vector. For example, the cache memory configuration component may receive the cache memory configuration vector from the second classification component and control access and management of the cache memory according to the parameter values of the selected cache memory configuration vector. In an aspect, the cache memory configuration component my not itself control access to and management of the cache memory, but may instruct cache memory controllers to control access to and management of the cache memory. In block 1112, the cache configurable computing device may retrieve hardware data relating to the new configuration of the cache memory. In block 1114, the cache configurable computing device may provide the hardware data to the computing device for implementing the method 1000 as described with reference to
In an aspect, various blocks 1102-1116 of the method 1100 may be executed by the configurable cache memory computing device during runtime of the configurable cache memory computing device. In an aspect, various blocks 1102-1116 of the method 1100 may be executed by the configurable cache memory computing device offline. In an aspect, offline execution may be execution on the configurable cache memory computing device or a computing device different from the configurable cache memory computing device, but not during runtime of the predicted application(s) or user behavior.
The mobile device 1200 may have one or more radio signal transceivers 1208 (e.g., Peanut, Bluetooth, Zigbee, Wi-Fi, RF radio) and antennae 1210, for sending and receiving communications, coupled to each other and/or to the processor 1202. The transceivers 1208 and antennae 1210 may be used with the above-mentioned circuitry to implement the various wireless transmission protocol stacks and interfaces. The mobile device 1200 may include a cellular network wireless modem chip 1216 that enables communication via a cellular network and is coupled to the processor.
The mobile device 1200 may include a peripheral device connection interface 1218 coupled to the processor 1202. The peripheral device connection interface 1218 may be singularly configured to accept one type of connection, or may be configured to accept various types of physical and communication connections, common or proprietary, such as USB, FireWire, Thunderbolt, or PCIe. The peripheral device connection interface 1218 may also be coupled to a similarly configured peripheral device connection port (not shown).
The mobile device 1200 may also include speakers 1214 for providing audio outputs. The mobile device 1200 may also include a housing 1220, constructed of a plastic, metal, or a combination of materials, for containing all or some of the components discussed herein. The mobile device 1200 may include a power source 1222 coupled to the processor 1202, such as a disposable or rechargeable battery. The rechargeable battery may also be coupled to the peripheral device connection port to receive a charging current from a source external to the mobile device 1200. The mobile device 1200 may also include a physical button 1224 for receiving user inputs. The mobile device 1200 may also include a power button 1226 for turning the mobile device 1200 on and off.
The various aspects described above may also be implemented within a variety of mobile devices, such as a laptop computer 1300 illustrated in
The various aspects (including, but not limited to, aspects discussed above with reference to
Computer program code or “program code” for execution on a programmable processor for carrying out operations of the various aspects may be written in a high level programming language such as C, C++, C#, Smalltalk, Java, JavaScript, Visual Basic, a Structured Query Language (e.g., Transact-SQL), Perl, or in various other programming languages. Program code or programs stored on a computer readable storage medium as used in this application may refer to machine language code (such as object code) whose format is understandable by a processor.
Many computing devices operating system kernels are organized into a user space (where non-privileged code runs) and a kernel space (where privileged code runs). This separation is of particular importance in Android and other general public license (GPL) environments where code that is part of the kernel space must be GPL licensed, while code running in the user-space may not be GPL licensed. It should be understood that the various software components/modules discussed here may be implemented in either the kernel space or the user space, unless expressly stated otherwise.
The foregoing method descriptions and the process flow diagrams are provided merely as illustrative examples and are not intended to require or imply that the operations of the various aspects must be performed in the order presented. As will be appreciated by one of skill in the art the order of operations in the foregoing aspects may be performed in any order. Words such as “thereafter,” “then,” “next,” etc. are not intended to limit the order of the operations; these words are simply used to guide the reader through the description of the methods. Further, any reference to claim elements in the singular, for example, using the articles “a,” “an” or “the” is not to be construed as limiting the element to the singular.
The various illustrative logical blocks, modules, circuits, and algorithm operations described in connection with the various aspects may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and operations have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the claims.
The hardware used to implement the various illustrative logics, logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but, in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Alternatively, some operations or methods may be performed by circuitry that is specific to a given function.
In one or more aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions or code on a non-transitory computer-readable medium or a non-transitory processor-readable medium. The operations of a method or algorithm disclosed herein may be embodied in a processor-executable software module that may reside on a non-transitory computer-readable or processor-readable storage medium. Non-transitory computer-readable or processor-readable storage media may be any storage media that may be accessed by a computer or a processor. By way of example but not limitation, such non-transitory computer-readable or processor-readable media may include RAM, ROM, EEPROM, FLASH memory, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above are also included within the scope of non-transitory computer-readable and processor-readable media. Additionally, the operations of a method or algorithm may reside as one or any combination or set of codes and/or instructions on a non-transitory processor-readable medium and/or computer-readable medium, which may be incorporated into a computer program product.
The preceding description of the disclosed aspects is provided to enable any person skilled in the art to make or use the claims. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the claims. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the following claims and the principles and novel features disclosed herein.