Embodiments generally relate to power management. More particularly, embodiments relate to adaptive hardware acceleration based on runtime power efficiency determinations.
Heterogeneous computing systems may use central processing units (CPUs) as well as hardware accelerators to handle workloads. Typically, the accelerator, which may include a relatively large number of processor cores, may have the fixed role of performing parallel data processing. The CPU, on the other hand, may have the fixed role of performing non-parallel data processing such as sequential code execution or data transfer management. Such a work distribution may be power inefficient for all types of workloads because for some workloads it may underutilize the CPU, be limited to single CPU-accelerator combinations, and waste time transferring data between accelerators and CPUs.
The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
Turning now to
Table I below shows one example of a set of rules 20 that might be configured and/or used by the power efficiency logic 10 when the workload 14 is audio content (e.g., received from an audio driver) that may be selectively “tunneled” to the hardware accelerator 16 (e.g., a DSP) for further processing.
Thus, in the first item listed in Table I, the hint for user interaction activity being in the “yes” state may indicate that execution of the workload 14 on the host processor 18 will be more power efficient than execution of the workload 14 on the hardware accelerator 16. Such a condition may arise due to the host processor 18 already being active as well as the host processor 18 being performance competitive with the hardware accelerator 16 for the particular type of workload 14. On the other hand, in the third item listed in Table I, the hints for low power and no user interaction being in the “no” state may indicate that execution of the workload 14 on the hardware accelerator 16 will be more power efficient than execution of the workload 14 on the host processor 18. This condition may arise due to power losses associated with bringing the host processor 18 out of the low power state. Additionally, there may be power losses associated with bringing the rest of the SoC (system on chip) out of the low power state. Other rules and notifications may be used, depending on the circumstances. Moreover, the rules may be dynamically configured/adapted at runtime to achieve a more flexible solution.
Illustrated processing block 28 provides for registering with a power hardware access layer (HAL) for receipt of one or more runtime usage notifications (e.g., user interaction hints, video encoding hints, video decoding hints, web browsing hints, touch boost hints, etc.). Block 28 may be conducted offline (e.g., prior to runtime). One or more runtime usage notifications may be received at block 30, wherein illustrated block 32 makes a power efficiency determination based on at least one of the runtime usage notification(s). Block 32 may include applying one or more configurable rules to the runtime usage notification(s). Block 32 may also provide for configuring one or more of the rules at runtime. A determination may be made at block 34 as to whether the power efficiency determination indicates that execution of a workload on a hardware accelerator will be more efficient than execution of the workload on a host processor. If so, the workload may be scheduled for execution on the hardware accelerator at block 36. If, on the other hand, the power efficiency determination indicates that that execution of the workload on the host processor will be more efficient than execution of the workload on the hardware accelerator, block 38 may schedule the workload for execution on the host processor.
The dotted line components in
The HAL 52 may therefore send the runtime usage notifications 12 to the power efficiency logic 10, which may accept workloads from the kernel 54 and automatically determine whether to schedule the workloads for execution on a hardware accelerator or a host processor.
The illustrated system 56 also includes an input output (IO) module 66 implemented together with the processor 18 on a semiconductor die 68 as a system on chip (SoC), wherein the IO module 66 functions as a host device and may communicate with, for example, a display 70 (e.g., touch screen, liquid crystal display/LCD, light emitting diode/LED display), a network controller 72, the hardware accelerator 16, and mass storage 74 (e.g., hard disk drive/HDD, optical disk, flash memory, etc.). The illustrated IO module 66 may include the logic 10 that makes power efficiency determinations at runtime based on runtime usage notifications and automatically decides whether to execute workloads on the processor 18 or the hardware accelerator 16 based on the power efficiency determinations. Thus, the logic 10 may perform one or more aspects of the method 26 (
Additional Notes and Examples:
Example 1 may include an adaptive computing system comprising a hardware accelerator, a host processor, and logic, implemented at least partly in one or more of configurable logic or fixed functionality logic hardware, to make a power efficiency determination at runtime based on one or more runtime usage notifications, schedule a workload for execution on the hardware accelerator if the power efficiency determination indicates that execution of the workload on the hardware accelerator will be more efficient than execution of the workload on the host processor, and schedule the workload for execution on the host processor if the power efficiency determination indicates that execution of the workload on the host processor will be more efficient than execution of the workload on the hardware accelerator.
Example 2 may include the system of Example 1, wherein the logic is to apply one or more configurable rules to at least one of the one or more runtime usage notifications.
Example 3 may include the system of Example 2, wherein the logic is to configure at least one of the one or more configurable rules at runtime.
Example 4 may include the system of Example 1, wherein the logic is to register with a power hardware access layer for receipt of the one or more runtime usage notifications, and wherein the one or more usage notifications are to indicate one or more of user interaction activity, video encoding activity, video decoding activity, web browsing activity or touch boost activity.
Example 5 may include the system of any one of Examples 1 to 4, wherein the workload is to include an audio playback workload.
Example 6 may include the system of any one of Examples 1 to 4, wherein the hardware accelerator includes one or more of an audio digital signal processor, a sensor or a graphics accelerator.
Example 7 may include a power efficiency apparatus comprising logic, implemented at least partly in one or more of configurable logic or fixed functionality logic hardware, to make a power efficiency determination at runtime based on one or more runtime usage notifications, schedule a workload for execution on a hardware accelerator if the power efficiency determination indicates that execution of the workload on the hardware accelerator will be more efficient than execution of the workload on a host processor, and schedule the workload for execution on the host processor if the power efficiency determination indicates that execution of the workload on the host processor will be more efficient than execution of the workload on the hardware accelerator.
Example 8 may include the apparatus of Example 7, wherein the logic is to apply one or more configurable rules to at least one of the one or more runtime usage notifications.
Example 9 may include the apparatus of Example 8, wherein the logic is to configure at least one of the one or more configurable rules at runtime.
Example 10 may include the apparatus of Example 7, wherein the logic is to register with a power hardware access layer for receipt of the one or more runtime usage notifications, and wherein the one or more usage notifications are to indicate one or more of user interaction activity, video encoding activity, video decoding activity, web browsing activity or touch boost activity.
Example 11 may include the apparatus of any one of Examples 7 to 10, wherein the workload is to include an audio playback workload.
Example 12 may include the apparatus of any one of Examples 7 to 10, wherein the hardware accelerator is to include one or more of an audio digital signal processor, a sensor or a graphics accelerator.
Example 13 may include a method of operating a power efficiency apparatus, comprising making a power efficiency determination at runtime based on one or more runtime usage notifications, scheduling a workload for execution on a hardware accelerator if the power efficiency determination indicates that execution of the workload on the hardware accelerator will be more efficient than execution of the workload on a host processor, and scheduling the workload for execution on the host processor if the power efficiency determination indicates that execution of the workload on the host processor will be more efficient than execution of the workload on the hardware accelerator.
Example 14 may include the method of Example 13, wherein making the power efficiency determination includes applying one or more configurable rules to at least one of the one or more runtime usage notifications.
Example 15 may include the method of Example 14, further including configuring at least one of the one or more configurable rules at runtime.
Example 16 may include the method of Example 13, further including registering with a power hardware access layer for receipt of the one or more runtime usage notifications, wherein the one or more usage notifications indicate one or more of user interaction activity, video encoding activity, video decoding activity, web browsing activity or touch boost activity.
Example 17 may include the method of any one of Examples 13 to 16, wherein the workload includes an audio playback workload.
Example 18 may include the method of any one of Examples 13 to 16, wherein the hardware accelerator includes one or more of an audio digital signal processor, a sensor or a graphics accelerator.
Example 19 may include at least one computer readable storage medium comprising a set of instructions, which when executed by a computing device, cause the computing device to make a power efficiency determination at runtime based on one or more runtime usage notifications, schedule a workload for execution on a hardware accelerator if the power efficiency determination indicates that execution of the workload on the hardware accelerator will be more efficient than execution of the workload on a host processor, and schedule the workload for execution on the host processor if the power efficiency determination indicates that execution of the workload on the host processor will be more efficient than execution of the workload on the hardware accelerator.
Example 20 may include the at least one computer readable storage medium of Example 19, wherein the instructions, when executed, cause a computing device to apply one or more configurable rules to at least one of the one or more runtime usage notifications.
Example 21 may include the at least one computer readable storage medium of Example 20, wherein the instructions, when executed, cause a computing device to configure at least one of the one or more configurable rules at runtime.
Example 22 may include the at least one computer readable storage medium of Example 19, wherein the instructions, when executed, cause a computing device to register with a power hardware access layer for receipt of the one or more runtime usage notifications, and wherein the one or more usage notifications are to indicate one or more of user interaction activity, video encoding activity, video decoding activity, web browsing activity or touch boost activity.
Example 23 may include the at least one computer readable storage medium of any one of Examples 19 to 22, wherein the workload is to include an audio playback workload.
Example 24 may include the at least one computer readable storage medium of any one of Examples 19 to 22, wherein the hardware accelerator is to include one or more of an audio digital signal processor, a sensor or a graphics accelerator.
Example 25 may include a power efficiency apparatus comprising means for making a power efficiency determination at runtime based on one or more runtime usage notifications; means for scheduling a workload for execution on a hardware accelerator if the power efficiency determination indicates that execution of the workload on the hardware accelerator will be more efficient than execution of the workload on a host processor; and means for scheduling the workload for execution on the host processor if the power efficiency determination indicates that execution of the workload on the host processor will be more efficient than execution of the workload on the hardware accelerator.
Example 26 may include the apparatus of Example 25, wherein the means for making the power efficiency determination includes means for applying one or more configurable rules to at least one of the one or more runtime usage notifications.
Example 27 may include the apparatus of Example 26, further including means for configuring at least one of the one or more configurable rules at runtime.
Example 28 may include the apparatus of Example 25, further including means for registering with a power hardware access layer for receipt of the one or more runtime usage notifications, wherein the one or more usage notifications are to indicate one or more of user interaction activity, video encoding activity, video decoding activity, web browsing activity or touch boost activity.
Example 29 may include the apparatus of any one of Examples 25 to 28, wherein the workload is to include an audio playback workload.
Example 30 may include the apparatus of any one of Examples 25 to 28, wherein the hardware accelerator is to include one or more of an audio digital signal processor, a sensor or a graphics accelerator.
Techniques described herein may therefore enable better utilization of host processor capacity. Additionally, the techniques may be extended beyond single CPU-accelerator combinations to more complex SoCs having multiple CPUs and/or multiple accelerators. For example, high performance computing (HPC) systems and multi-player game applications may achieve greater power efficiency. Moreover, time spent transferring data between accelerators and CPUs may be minimized and fixed roles regarding data parallelism may be eliminated. Simply put, work distribution may be more power efficient using techniques described herein.
Embodiments are applicable for use with all types of semiconductor integrated circuit (“IC”) chips. Examples of these IC chips include but are not limited to processors, controllers, chipset components, programmable logic arrays (PLAs), memory chips, network chips, systems on chip (SoCs), SSD/NAND controller ASICs, and the like. In addition, in some of the drawings, signal conductor lines are represented with lines. Some may be different, to indicate more constituent signal paths, have a number label, to indicate a number of constituent signal paths, and/or have arrows at one or more ends, to indicate primary information flow direction. This, however, should not be construed in a limiting manner. Rather, such added detail may be used in connection with one or more exemplary embodiments to facilitate easier understanding of a circuit. Any represented signal lines, whether or not having additional information, may actually comprise one or more signals that may travel in multiple directions and may be implemented with any suitable type of signal scheme, e.g., digital or analog lines implemented with differential pairs, optical fiber lines, and/or single-ended lines.
Example sizes/models/values/ranges may have been given, although embodiments are not limited to the same. As manufacturing techniques (e.g., photolithography) mature over time, it is expected that devices of smaller size could be manufactured. In addition, well known power/ground connections to IC chips and other components may or may not be shown within the figures, for simplicity of illustration and discussion, and so as not to obscure certain aspects of the embodiments. Further, arrangements may be shown in block diagram form in order to avoid obscuring embodiments, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements are highly dependent upon the platform within which the embodiment is to be implemented, i.e., such specifics should be well within purview of one skilled in the art. Where specific details (e.g., circuits) are set forth in order to describe example embodiments, it should be apparent to one skilled in the art that embodiments can be practiced without, or with variation of, these specific details. The description is thus to be regarded as illustrative instead of limiting.
The term “coupled” may be used herein to refer to any type of relationship, direct or indirect, between the components in question, and may apply to electrical, mechanical, fluid, optical, electromagnetic, electromechanical or other connections. In addition, the terms “first”, “second”, etc. may be used herein only to facilitate discussion, and carry no particular temporal or chronological significance unless otherwise indicated.
Those skilled in the art will appreciate from the foregoing description that the broad techniques of the embodiments can be implemented in a variety of forms. Therefore, while the embodiments have been described in connection with particular examples thereof, the true scope of the embodiments should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.