The present disclosure relates generally to processors and more particularly to power management for processors.
A processor is typically constrained to operate within a power budget, wherein the power budget is based on one or more of a variety of factors, such as a target battery life for a battery supplying the processor, thermal limitations to preserve a desired lifespan of the processor, programmable performance settings for the processor, and the like. For a processor including more than one compute unit (e.g., a processor including multiple processor cores), meeting the power budget can be achieved by managing power states of the compute units individually. For example, an operating system executing at the processor can place a compute unit that is experiencing low levels of processing activity into an idle state, whereby the compute unit consumes a relatively small amount of power but is not able to perform useful processing activity. The operating system can return the idle compute unit to an active state in response to identifying that the compute unit is required to perform a processing operation, such as processing of a message from another compute unit that has been targeted to the idle processor core. However, transitions into and out of the idle state can consume an undesirable amount of power and make it difficult to meet the power budget while maintaining a desired level of processing activity at the processor.
The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.
The processor 100 includes a plurality of compute units, wherein a compute unit is the unit of computation capable of executing a sequence of commands for the processor 100 under hardware or software control. Examples of compute units include processor cores, GPU compute units, and the like. In the example of processor 100, the compute units are processor cores 102-105. Each of the processor cores 102-105 includes an instruction pipeline having a fetch stage, decode stage, dispatch stage, a plurality of execution units, and a retire stage that together are configured to fetch and execute instructions in a pipelined fashion. For the example of
Each of the processor cores 102-105 can be individually and selectively placed in any of a number of power modes, wherein the power modes govern the amount of power supplied to the processor core and the corresponding speed with which the processor core can execute instructions. In some embodiments, the power modes include an active mode, wherein the processor core is supplied a nominal amount of power and executes instructions at a nominal rate, and one or more low-power modes, wherein the processor core is supplied a lower amount of power as compared to the nominal amount, and executes instructions at a lower rate that the nominal rate or, in the case of an idle mode, does not execute instructions.
In the example of
In addition, the PMM 110 can set the power modes for the processor cores 102-105 to meet a power/thermal (P/T) budget 112. As used herein, a power/thermal budget can refer to a power budget, a thermal budget, or a combination of both. The P/T budget 112 indicates a specified maximum amount of power that is to be consumed by the processor 100 over a specified amount of time. In some embodiments, the P/T budget 112 is expressed directly in terms of an amount of power. In other embodiments, the budget is expressed in terms of a specified thermal budget, indicating a maximum average temperature the processor 100 is allowed to operate at over the specified amount of time. Expressing the budget in this way can be useful when the primary goal for the P/T budget 112 is to preserve a specified lifespan of the processor 100. In either case, the PMM 110 can measure the power consumed by the modules of the processor 100, the temperature at one or more locations of an integrated circuit incorporating the processor 100, or a combination thereof, and based on these measurements adjust the power modes of the processor cores 102-105 to ensure that the processor 100 does not exceed the P/T budget 112.
As indicated above, the processor cores 102-105 execute threads of computer programs. In many cases, these threads interact with other modules of the processor 100, including threads executing at other processor cores, input/output (I/O) circuitry (not shown) of the processor 100, memory controllers (not shown) of the processor 100, and the like. The threads interact with the other modules via sets of information generally referred to herein as messages. Examples of messages include interrupts, monitor wait (MWAIT) instructions, and the like. In some embodiments, a message can be any wakeup event that would cause a processor core to be awoken from an idle or other low-power state to an active state. The processor 100 includes a message controller 115 that is generally configured to monitor busses, interfaces, and other modules to identify messages at the processor 100. At least some of the messages will be targeted to one of the processor cores 102-105—that is, a message will indicate, via a field of the message or other identifier, that its destination is one of the processor cores 102-105. The message controller 115 provides such messages to the PMM 110.
In response to receiving a message from the message controller 115, the PMM 110 identifies the processor core targeted by the message. If the processor core targeted by the message is in the active state, the PMM 110 provides the message to the processor core for servicing. If the processor core targeted by the message is in one of a set of specified low-power states, the PMM 110 identifies other processor cores that are in the active state, selects one of the other processor cores, and provides the message to the selected processor core for servicing. As described further herein, the PMM 110 can select the processor core based on any of a number of criteria, such as whether the processor core is stalled, performance characteristics of the processor core relative to other processor cores that are in the active mode, and the like. Be redirecting the message to an active, relatively less efficient processor core, the PMM 110 is able to maintain the targeted processor core in the idle (or other low-power) state, thereby avoiding the power costs of transitioning the targeted processor core to the active state. In some embodiments, the PMM 110 redirects messages from an idle processor core only in response to identifying that transitioning the processor core to the active state to service the message would cause, or is predicted to cause, the processor 100 to exceed the power/thermal budget 112.
In some embodiments, a processor core can service a message targeted to another processor core only if it can be placed in similar architectural state as the targeted processor core. Accordingly, the processor 100 includes a memory 120 that stores architectural states 122 for the processor cores 102-105. In response to specified checkpoints for a processor core, such as when a processor core enters the idle mode, the PMM 110 stores the architectural state of the processor core (e.g., the contents of the register file and other state information) to the architectural states 122. The PMM 110 can restore the architectural state to the processor core in response to other specified events, such as the processor core transitioning from the idle state to an active state. In addition, and as explained further below, in response to selecting an active processor core to service a message targeted to a processor core in the idle mode, the PMM 110 can store the architectural state for the selected processor core to the memory 120, then load the architectural state for the targeted processor core to the selected processor core. The selected processor core thereby becomes a logical replica of the targeted processor core, so that the selected processor core services the message in the same way, with the same result, as if it had been processed at the targeted processor core.
After the message has been serviced, the PMM 110 can then store the architectural state for the selected processor core to the memory 120 and, when the targeted processor core exits the idle state, transfer the architectural state from the memory 120 to the targeted processor core. The targeted processor core is thus put into the architectural state it would have had if it had serviced the message. The PMM 110 thereby is able to redirect messages to active processor cores without affecting the servicing of messages.
To select the processor core for redirection, the PMM 110 reviews the processing status of the threads being executed at each of the processor cores 102-105 and determines that the processor core 102 is in a stalled state. The stalled state can result from any of a number of conditions, such as a data dependency in the thread being executed causing the thread to stall as it awaits processing of the instruction (at a different thread or processor core) upon which an instruction of the thread depends. In some embodiments, the PMM 110 can identify the stalled state of the processor core 102 based on the processor core 102 setting a flag or other identifier that it is stalled while awaiting execution of an instruction of another thread. In other embodiments, the PMM 110 can identify the stalled state of the processor core 102 based on performance characteristics recorded at the performance monitor 116. For example, the PMM 110 can identify that the processor core 102 is in a stalled state in response to the IPC for the processor core 102, as recorded at the performance monitor 116, falling below a threshold value.
In response to identifying that the processor core 102 is in the stalled state, the PMM 110 provides the interrupt 230 to the processor core 102, where the interrupt 230 is serviced. In particular, the processor core 102 services the interrupt in the same fashion, to achieve the same result, as if the interrupt 230 had been serviced at the processor core 102 to which it was originally targeted. The processor core 102 is maintained in the idle state while the interrupt 230 is serviced, thereby conserving power.
In some embodiments, more than one of the processor cores 103-105 may be in the stalled state when the interrupt 230 is received by the PMM 110. The PMM 110 can select from among the stalled processor cores based on any of a variety of criteria, such as the length of time each processor core has been in the stalled state (e.g. selecting the processor core that has been in the stalled state the least amount of time), a confidence value indicating the likelihood that the processor core is in fact in the stalled state, the interrupt handler execution speed among processor cores in the stalled state, and other factors.
To select the processor core for redirection, the PMM 110 reviews the processing efficiency for the threads being executed at each of the processor cores 102-105. The processing efficiency can be indicated by any of a number of performance characteristics for each processor core, or a combination thereof, as recorded at the performance monitor 116. For example, the processing efficiency can be indicated by the IPC for each processor core, the instruction retirement rate for each processor core, a moving average of the number of idle cycles for each processor core, a moving average of the number of stalls for each processor core, and the like. In the depicted example, the processing efficiency is indicated as a percentage of the number of active or “useful” cycles of execution for the processor core over a specified span of time. However, in other embodiments the processing efficiency for each processor core may be indicated differently, such as by a raw number of idle cycles or other value.
In the depicted example, the processor core 104 has the lowest processing efficiency value, indicating that the thread it is executing is the least efficient of the threads being executed at active processor cores. In response to identifying that the processor core 104 has the lowest processing efficiency, the PMM 110 provides the interrupt 331 to the processor core 104 for servicing while the processor core 102 is maintained in the idle state.
At a subsequent time 437, the PMM 110 determines to redirect an interrupt targeted to the processor core 102, still in the idle state, to the processor core 103 for servicing. In response, the PMM 110 stores the architectural state information for the processor core 103, designated architectural state 441, to the memory 120. The PMM 110 then loads the architectural state 440 to the processor core 103. The processor core 103 is thereby made logically equivalent to the processor core 102. The processor core 103 then services the interrupt as if it were being serviced at the processor core 102 to which it was originally targeted.
At a subsequent time 438, the PMM 110 identifies that the processor core 103 has completed servicing of the interrupt. In response, the PMM 110 causes the processor core 103 to store the architectural state 440 to the memory 120. This architectural state 440 may have been modified based on the servicing of the interrupt. The PMM 110 then loads the architectural state 441 to the processor core 103, thereby returning the processor core 103 to its state prior to servicing the interrupt and allowing the processor core 103 to continue executing any thread it was executing prior to servicing the interrupt.
At a later time (not shown at
At block 506 the PMM 110 receives a message from the message controller 115, wherein the message is targeted to the processor core 103. In response to identifying that the processor core 103 is in the idle state, at block 508 the PMM 110 selects an active processor core to service the message. At block 510 the PMM 110 stores the architectural state for the selected processor core to the memory 120 and, at block 512, the PMM 110 loads the architectural state for the idle processor core 103 to the selected processor core. At block 514, the selected processor core services the message while the processor core 103 is maintained in the idle state, thereby conserving power at the processor 100.
In some embodiments, certain aspects of the techniques described above may implemented by one or more processors of a processing system executing software. The software comprises one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium. The software can include the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer readable storage medium can include, for example, a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like. The executable instructions stored on the non-transitory computer readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.
A computer readable storage medium may include any storage medium, or combination of storage media, accessible by a computer system during use to provide instructions and/or data to the computer system. Such storage media can include, but is not limited to, optical media (e.g., compact disc (CD), digital versatile disc (DVD), Blu-Ray disc), magnetic media (e.g., floppy disc, magnetic tape, or magnetic hard drive), volatile memory (e.g., random access memory (RAM) or cache), non-volatile memory (e.g., read-only memory (ROM) or Flash memory), or microelectromechanical systems (MEMS)-based storage media. The computer readable storage medium may be embedded in the computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory), or coupled to the computer system via a wired or wireless network (e.g., network accessible storage (NAS)).
Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed. Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims. Moreover, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope of the disclosed subject matter. Accordingly, the protection sought herein is as set forth in the claims below.