The present disclosure pertains to managing the power consumption of processors, in particular, to mechanism that may allow the software to control the power consumption at fine scales.
Power management is an important aspect of processors. Power management may reduce the power consumption of processors, and thus reduce the power consumption cost and increase the use time of a battery. However, power management mechanism may also have costs. For example power management may reduce microprocessor performance and may stall an application when the application tries to use a processor unit that has been powered off. For these reasons, systems that incorporate power management mechanism may predict the behavior of applications being executed in order to reduce power consumption or to power off units that may not be needed while keeping units that will be used in power.
Embodiments are illustrated by way of example and not limitation in the Figures of the accompanying drawings:
Embodiments of the present invention may include a computer system as shown in
In one embodiment, the processor 102 includes a Level 1 (L1) internal cache memory 104. Depending on the architecture, the processor 102 can have a single internal cache or multiple levels of internal cache. Alternatively, in another embodiment, the cache memory can reside external to the processor 102. Other embodiments can also include a combination of both internal and external caches depending on the particular implementation and needs. Register file 106 can store different types of data in various registers including integer registers, floating point registers, status registers, and instruction pointer register.
Execution unit 108, including logic to perform integer and floating point operations, also resides in the processor 102. The processor 102 may also include a microcode (ucode) ROM that stores microcode for certain macroinstructions. For one embodiment, execution unit 108 includes logic to handle a packed instruction set 109. By including the packed instruction set 109 in the instruction set of a general-purpose processor 102, along with associated circuitry to execute the instructions, the operations used by many multimedia applications may be performed using packed data in a general-purpose processor 102. Thus, many multimedia applications can be accelerated and executed more efficiently by using the full width of a processor's data bus for performing operations on packed data. This can eliminate the need to transfer smaller units of data across the processor's data bus to perform one or more operations one data element at a time.
Alternate embodiments of an execution unit 108 can also be used in micro controllers, embedded processors, graphics devices, DSPs, and other types of logic circuits. System 100 includes a memory 120. Memory 120 can be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, flash memory device, or other memory device. Memory 120 can store instructions and/or data represented by data signals that can be executed by the processor 102.
A system logic chip 116 may be coupled to the processor bus 110 and memory 120. The system logic chip 116 in the illustrated embodiment is a memory controller hub (MCH). The processor 102 can communicate to the MCH 116 via a processor bus 110. The MCH 116 provides a high bandwidth memory path 118 to memory 120 for instruction and data storage and for storage of graphics commands, data and textures. The MCH 116 is to direct data signals between the processor 102, memory 120, and other components in the system 100 and to bridge the data signals between processor bus 110, memory 120, and system I/O 122. In some embodiments, the system logic chip 116 can provide a graphics port for coupling to a graphics controller 112. The MCH 116 is coupled to memory 120 through a memory interface 118. The graphics card 112 is coupled to the MCH 116 through an Accelerated Graphics Port (AGP) interconnect 114.
System 100 uses a proprietary hub interface bus 122 to couple the MCH 116 to the I/O controller hub (ICH) 130. The ICH 130 provides direct connections to some I/O devices via a local I/O bus. The local I/O bus is a high-speed I/O bus for connecting peripherals to the memory 120, chipset, and processor 102. Some examples are the audio controller, firmware hub (flash BIOS) 128, wireless transceiver 126, data storage 124, legacy I/O controller containing user input and keyboard interfaces, a serial expansion port such as Universal Serial Bus (USB), and a network controller 134. The data storage device 124 can comprise a hard disk drive, a floppy disk drive, a CD-ROM device, a flash memory device, or other mass storage device.
For another embodiment of a system, an instruction in accordance with one embodiment can be used with a system on a chip. One embodiment of a system on a chip comprises of a processor and a memory. The memory for one such system is a flash memory. The flash memory can be located on the same die as the processor and other system components. Additionally, other logic blocks such as a memory controller or graphics controller can also be located on a system on a chip.
Embodiments of the present invention may include a processor including a core and a dedicated control register having stored thereon data indicating a power management state of the core.
Embodiments of the present invention may include a processor including at least one power domain, each power domain including at least one core that switchably receives power supply from a voltage regulator and switchably receives a clock signal from a clock source; a cache, and at least one dedicated control register having stored thereon data indicating power management states of the at least one power domain and the cache.
Embodiments of the present invention may include a processor including (1) a first block of control registers having stored thereon first data indicating power management states of power domains of the processor; (2) a second block of control registers having stored thereon second data indicating power management states of one or more caches of the processor; and (3) a third block of control registers having stored thereon third data indicating power management of each core in the power domains of the processor.
Embodiments of the present invention may include a method including in response to a request for a power management state of a hardware unit in a processor, retrieving the power management state from a corresponding control register; computing a target power management state for the hardware unit based on the retrieved power management state for the hardware unit; and storing the target power management state to the corresponding control register.
Power management may be achieved by clock gating and power gating either the cores or the cache. Clock gating is a method of disabling the clock signal (CLK) supplied to a core during a gated period of time, thereby eliminating active power consumption. While clock gating may eliminate active power consumption, clock gating does not eliminate the DC power consumption. Thus, clock gating may “leak” power while the clock signal is disabled. Power gating stops the power supply to a core, and thus eliminate all power consumptions of the core. However, power gating a core may destroy the states of the core as well, which may stall the core and require a “wake-up” period when later the core is to be used again. To avoid the stall caused by power gating, software applications may need to ensure that all hardware units they use are activated in advance before their actual usage.
The voltage regulator 210 may include a control input 236 that may receive a voltage control word that may include one or more bits. Based on the bits of the voltage control word, voltage regulator 210 may be set to either normal voltage operation or the power gated state. Further, the voltage control word may include one or more bits to set Vdd voltage value to the cores. For example, the voltage Vdd may be set within a range of 1-2 volts. Similarly, the clock source 238 may include a control input 240 that may receive clock control word that may include one or more bits. Based on the bits of the clock control word, clock source 238 may be set to either normal clock operation or the clock gated state. Further, the clock control word may include one or more bits to set clock rate to the cores. The clock rate may be within a range that is less or equal to a maximum clock rate.
Clock gating and power gating may be achieved by switches that control the supply of clock signal (CLK) or power (Vdd) in each domain. As shown in
Power management mechanism may also manage the usage of caches. Caches at all levels in the memory hierarchy may have the capability of disabling individual lines and/or ways to adjust the capacity and associativity of the cache to meet the objectives of power consumption based on the needs of the application. As shown in
Selected cache lines may be disabled in conjunction with reconfiguration of the hit/miss logic of the cache. For example, in an embodiment, half of cache lines may be turned off in response to the status of an indicator bit to make the cache appear to the outside as one having half of the original capacity.
In an alternative embodiment, the cache lines or ways may be disabled by requiring that the software application to refrain from issuing any memory references to the disabled line or ways. In yet an alternative embodiment, cache lines or ways may be disabled by clock gating (e.g., disabling the clock to the logic that drives the lines or ways), or by power gating (e.g., removing the power supply to the lines or ways, which may destroy data stored in the lines or ways), or by “drowsy cache”—i.e., retaining data stored in the lines or ways but requiring a “wake-up” period before the line or ways may be used again.
Embodiments of the present invention may also include power management mechanism that control the power and clock supplies to components inside each core. As shown in
As discussed above, the power management mechanism as described above may have different costs and benefits. The change of the supply voltage and clock rate of certain domains may yield energy savings because of the quadratic relationship between supply voltage and power consumption. Clock gating may be turned on and off quickly, often in a single clock cycle. However, clock gating only reduces active power consumption, leaving leakage power untouched. Power gating may completely eliminate a circuit unit's power consumption, but any important state information in the circuit unit may need to be saved and later restored when the circuit is power gated off or on. The saving and restoration of state information may impose a performance and energy cost to power gating. Therefore, to achieve the optimal power management, application may need to solve complex control problems, taking into consideration not only cores and cache as a whole but also components within each core. This may require the application to have easy access to the status of each core and cache, and the components therein. Also, the application may need an interface to easily change the power operational states of domains, cores, cache and components in a CPU. Embodiments of the present invention provide a set of control registers 236 having stored thereon data indicating the power management states of each hardware units. Because of the set of dedicated control registers 236, software programs may easily access, including read or write, the power management states of hardware units.
Embodiments of the present invention may create a register interface in a processor including a set of memory-mapped control registers that allow a software application to interact with hardware components for power management purpose. In one embodiment, the control registers are dedicated for storing power management states of hardware units.
The first block 302 of registers may include one or more registers 302.1-302.N, each of which may indicate the power management status of a corresponding power domain. In one embodiment, each of the one or more registers may further include a first bit for indicating power gate status and a second bit for indicating clock gate status. For example, register 302.1 may include a first bit 314.1 which indicates the domain 0 should be in power gating if the first bit is ON (or =“1) and should not be in power gating if the first bit is OFF (or =“0”). The register 302 may include a second bit 314.2 which indicates the domain 0 should be in clock gating if the second bit is ON and should not be in clock gating if the second bit is OFF. Register 302.1 may further include third bits 314.3 indicating the voltage of Vdd, and forth bits 314.4 indicating a clock rate for CLK. Therefore, each domain may set its own Vdd and/or CLK. In one embodiment, bits 314.1, 314.3 may form the voltage control word that may be supplied to the control input (such as 236) of the voltage regulator (such as 210), bits 314.2, 314.4 may form the clock control word that may be supplied to the control input (such as 240) of the clock source (such as 238).
The second block 304 may include a first register 304.1 for ways in the top-level cache (L3 level, e.g., cache 204 as shown in
The third block 306 of registers may include one or more registers 306.1-306.N, each of which may include the power management status of a corresponding core. In one embodiment, each register may include a plurality of bits for indicating the power management status of components inside the core. For example, in one embodiment, a register may include bits for cache ways disable 316.1, cache lines disable 316.2, core power gate 316.3, core clock gate 316.4, IALU power gate 316.5, IALU clock gate 316.6, FALU power gate 316.7, FALU clock gate 316.8, MALU power gate 316.9, and MALU clock gate 316.10. Bits 316.1 and 316.2 may indicate enablement/disablement of ways and lines of caches inside the corresponding core. Bits 316.3 and 316.4 may respectively indicate power gate and clock gate states of the core. Bits 316.5 and 316.6 may respectively indicate power gate and clock gate states of IALU of the core. Bits 316.7 and 316.8 may respectively indicate power gate and clock gate states of FALU of the core. Bits 316.9 and 316.10 may respectively indicate power gate and clock gate states MALU of the core. Therefore, a register in the third block may indicate the power management status of a core including components therein.
Software programs including both the operating system (OS) and applications may have access to the control register interface 300. In one embodiment, the OS may have the right to access all of the registers in the register interface 300 through a pointer 308. For accessing each register in the register interface 300, the OS may reference the address of the specific register that the OS intends to access via pointer 308. Applications, on the other hand, may only have the right to access part of the registers of the register interface 300. Therefore, applications may not directly reference each register of the register interface 300. Instead, the applications may access the register interface 300 through a thread and core mapping module 312 which may include a lookup table that may map an application visible thread ID onto the set of control registers corresponding to the set of hardware executing the thread. The thread and core mapping module 312 may first prevent the application from de-activating hardware that is in use by other applications because the lookup table will block any attempts to affect hardware that is not allocated to the application. The thread and core mapping module 312 may secondly separate resources that are visible to an application (or threads of the application) from the specific hardware being used to execute those threads. This separation may make it easy for the hardware and/or operating system to migrate these application threads among cores because the application does not need to know which core a thread is running on.
The OS and applications may issue load operations (i.e., read from the register interface) that target these control registers in order to learn the current power management state of units in the system. Based on the power management state of units in the system, the OS and applications may include a power management module that calculates when to switch the power management state of a unit in the system. The OS and applications may issue a store operation to the control registers in the register interface to change the hardware unit's power management configuration. For example, a store operation that writes a “1” to a bit of a control register in the register interface may instruct the corresponding hardware unit to start to power on or to start to supply clock to the hardware unit. Conversely, a store operation that writes a “0” to a bit of a control register in the register interface may instruct the corresponding hardware unit to start to power off or to start to disable clock to the hardware unit.
In one embodiment, the OS and application software may issue a read operation to the register interface. The read operation may be implemented to inquire and return the actual power management state of the corresponding hardware unit. The actual power management state, in practice, may be different from the indicated power management state that is being stored in the corresponding control register. This kind of scenarios may occur in the following situations. For example, when software issues a request for a unit to be powered on, a load operation of that control register may continue to return a state of “0” (off) until the unit has completely powered on and is available for use. Also, there may be situations where the hardware on its own decides to overrule a software request. For example, software requests that a processor be powered on, but the processor is already at its thermal limit. In such a situation, the readable value of the control register may not change until the hardware is able to comply with the request. Depending on the implementation, attempts to use a unit before it is ready may stall the program or cause an application error.
In one embodiment, the status of registers between register blocks may be inter-related. For example, if a domain is indicated powered-off, the cores within the domain would be indicated powered-off as well. Cores within the domain may be indicated powered-on only when the domain of the cores is powered on. Similarly, if a core is indicated powered-off, the hardware units within the core would be indicated power-off as well. Hardware units within the core may be indicated power-on only when the core of the hardware units is powered on.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.