The teachings of the present disclosure relate generally to processor power consumption, and more particularly, to techniques for reducing processor power consumption according to needs of a particular instruction.
As capabilities increase and form factor is reduced, processor-based equipment (e.g., system-on-a-chip (SoC)) tends to exhibit more power consumption than legacy equipment. Accordingly, power efficiency for processor-based equipment is becoming increasingly important as processors evolve. Specific considerations are the reduction of thermal effects and energy conservation (e.g., reducing amount of power used during operation). Also, apart from energy conservation, power efficiency is a concern for battery-operated processor-based equipment, where it is desired to minimize battery size so that the equipment can be made small and lightweight.
CPUs may be utilizing higher clock frequencies (which may also require higher voltages and thus higher power consumption) than necessary for certain programs. Software-based techniques have been used to reduce processor power consumption; however, the effectivity of such techniques is generally limited by inefficiencies. For example, software-based solutions cannot efficiently control frequency scaling due to ineffective sampling windows (e.g., the frequency at which data is monitored and the clock frequency is adjusted).
Thus, as the demand for power efficient processor-based equipment continues to increase, there exists a need for further improvements to the technology.
The following presents a simplified summary of one or more aspects of the present disclosure, in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated features of the disclosure, and is intended neither to identify key or critical elements of all aspects of the disclosure nor to delineate the scope of any or all aspects of the disclosure. Its sole purpose is to present some concepts of one or more aspects of the disclosure in a simplified form as a prelude to the more detailed description that is presented later.
Aspects of the disclosure relate to a method for dynamically scaling a clock frequency, the method comprising retrieving, by an advanced peripheral bus (APB) driver, a first one or more values from one or more registers of a core processor, the first one or more values corresponding to a set of instructions of the core processor. The method may also include determining, by an IPC calculator, a first expected instruction per cycle (IPC) for executing the set of instructions based on the first one or more values. The method may also include comparing, by the IPC calculator, a threshold IPC to the first expected IPC to determine whether an equality condition is met, wherein the threshold IPC is stored in a first register of the IPC calculator. If the equality condition is not met, the method may include determining, by the IPC calculator, a first scaling value, scaling, by a clock generator, a clock signal of the core processor according to the first scaling value, executing, by the core processor, the set of instructions using the clock signal scaled according to the first scaling value, and updating, by the core processor, the first one or more values of the one or more registers with a second one or more values. If the equality condition is met, executing, by the core processor, the set of instructions using the clock signal of the core processor.
Aspects of the disclosure relate to an apparatus, comprising: a memory; and a processor communicatively coupled to the memory, the processor configured to: retrieve a first one or more values from one or more registers of a core processor, the first one or more values corresponding to a set of instructions of the core processor. The processor may also be configured to determine a first expected instruction per cycle (IPC) for executing the set of instructions based on the first one or more values, and compare a threshold IPC to the first expected IPC to determine whether an equality condition is met, wherein the threshold IPC is stored in a first register of an IPC calculator. If the equality condition is not met, the processor is further configured to determine a first scaling value, scale a clock signal of the core processor according to the first scaling value, execute the set of instructions using the clock signal scaled according to the first scaling value, and update the first one or more values of the one or more registers with a second one or more values. If the equality condition is met, the processor is further configured to execute the set of instructions using the clock signal of the core processor.
Aspects of the disclosure relate to an apparatus, including means for retrieving a first one or more values from one or more registers of a core processor, the first one or more values corresponding to a set of instructions of the core processor. The apparatus may also include means for determining a first expected instruction per cycle (IPC) for executing the set of instructions first one or more values. The apparatus may also include means for comparing a threshold IPC to the first expected IPC to determine whether an equality condition is met, wherein the threshold IPC is stored in a first register of an IPC calculator. If the equality condition is not met, the apparatus may include means for determining a first scaling value, means for scaling a clock signal of the core processor according to the first scaling value, means for executing the set of instructions using the clock signal scaled according to the first scaling value, and means for updating the first one or more values of the one or more registers with a second one or more values. If the equality condition is met, the apparatus may also include means for executing, by the core processor, the set of instructions using the clock signal of the core processor.
Aspects of the disclosure relate to a non-transitory computer-readable storage medium that stores instructions that when executed by a processor of an apparatus cause the apparatus to perform a method for dynamically scaling a clock frequency, including: retrieving a first one or more values from one or more registers of a core processor, the first one or more values corresponding to a set of instructions of the core processor. The method may also include determining a first expected instruction per cycle (IPC) for executing the set of instructions based on the first one or more values. The method may also include comparing a threshold IPC to the first expected IPC to determine whether an equality condition is met, wherein the threshold IPC is stored in a first register of an IPC calculator. If the equality condition is not met, the method may also include determining a first scaling value, scaling a clock signal of the core processor according to the first scaling value, executing the set of instructions using the clock signal scaled according to the first scaling value, and updating the first one or more values of the one or more registers with a second one or more values. If the equality condition is met, the method may also include executing the set of instructions using the clock signal of the core processor.
Aspects of the present disclosure provide means for, apparatus, processors, and computer-readable mediums for performing techniques and methods for dynamically scaling a clock frequency at processor based equipment.
To the accomplishment of the foregoing and related ends, the one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the appended drawings set forth in detail certain illustrative features of the one or more aspects. These features are indicative, however, of but a few of the various ways in which the principles of various aspects may be employed.
So that the manner in which the above-recited features of the present disclosure can be understood in detail, a more particular description, briefly summarized above, may be had by reference to aspects, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this disclosure and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.
The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well known structures and components are shown in block diagram form in order to avoid obscuring such concepts.
The various embodiments will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made to particular examples and implementations are for illustrative purposes, and are not intended to limit the scope of the invention or the claims.
While features of the present invention may be discussed relative to certain embodiments and figures below, all embodiments of the present invention can include one or more of the advantageous features discussed herein. In other words, while one or more embodiments may be discussed as having certain advantageous features, one or more of such features may also be used in accordance with various other embodiments discussed herein.
The term “system on chip” (SoC) is used herein to refer to a single integrated circuit (IC) chip that contains multiple resources and/or processors integrated on a single substrate. A single SoC may contain circuitry for digital, analog, mixed-signal, and radio-frequency functions. A single SoC may also include any number of general purpose and/or specialized processors (digital signal processors, modem processors, video processors, etc.), memory blocks (e.g., read-only memory (ROM), RAM, Flash, etc.), and resources (e.g., timers, voltage regulators, oscillators, etc.), any or all of which may be included in one or more cores.
A number of different types of memories and memory technologies are available or contemplated in the future, all of which are suitable for use with the various aspects of the present disclosure. Such memory technologies/types include phase change memory (PRAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), non-volatile random-access memory (NVRAM), flash memory (e.g., embedded multimedia card (eMNIC) flash, flash erasable programmable read only memory (FEPROM)), pseudostatic random-access memory (PSRAM), double data rate synchronous dynamic random-access memory (DDR SDRAM), and other random-access memory (RAM) and ROM technologies known in the art. A DDR SDRAM memory may be a DDR type 1 SDRAM memory, DDR type 2 SDRAM memory, DDR type 3 SDRAM memory, or a DDR type 4 SDRAM memory.
Each of the above-mentioned memory technologies include, for example, elements suitable for storing instructions, programs, control signals, and/or data for use in or by a computer or other digital electronic device. Any references to terminology and/or technical details related to an individual type of memory, interface, standard or memory technology are for illustrative purposes only, and not intended to limit the scope of the claims to a particular memory system or technology unless specifically recited in the claim language. Mobile computing device architectures have grown in complexity, and now commonly include multiple processor cores, SoCs, co-processors, functional modules including dedicated processors (e.g., communication modem chips, global positioning system (GPS) processors, display processors, etc.), complex memory systems, intricate electrical interconnections (e.g., buses and/or fabrics), and numerous other resources that execute complex and power intensive software applications (e.g., video streaming applications, etc.).
The processing system 120 is interconnected with one or more controller module(s) 112, input/output (I/O) module(s) 114, memory module(s) 116, and system component and resources module(s) 118 via a bus module 110 which may include an array of reconfigurable logic gates and/or implement bus architecture (e.g., CoreConnect, advanced microcontroller bus architecture (AMBA), etc.). Bus module 110 communications may be provided by advanced interconnects, such as high performance networks on chip (NoCs). The interconnection/bus module 110 may include or provide a bus mastering system configured to grant SoC components (e.g., processors, peripherals, etc.) exclusive control of the bus (e.g., to transfer data in burst mode, block transfer mode, etc.) for a set duration, number of operations, number of bytes, etc. In some cases, the bus module 110 may implement an arbitration scheme to prevent multiple master components from attempting to drive the bus simultaneously.
The controller module 112 may be a specialized hardware module configured to manage the flow of data to and from the memory module 116, the processor memory 108, or a memory device located off-chip (e.g., a flash memory device). In some examples, the memory module may include a host device configured to receive various memory commands from multiple masters (e.g., processors and/or other modules), and address and communicate the memory commands to a memory device. The multiple masters may include processors 102, 104, and 106, and/or multiple applications running on one or more of the processors 102, 104, and 106. The controller module 112 may comprise one or more processors configured to perform operations disclosed herein. Examples of processors include microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure.
The I/O module 114 is configured for communicating with resources external to the SoC 100. For example, the I/O module 114 includes an input/output interface (e.g., a bus architecture or interconnect) or a hardware design for performing specific functions (e.g., a memory, a wireless device, and a digital signal processor). In some examples, the I/O module 114 includes circuitry to interface with peripheral devices, such as a memory device located off-chip.
The memory module 116 is a computer-readable storage medium implemented in the SoC 100. The memory module 116 may provide one or more of a non-volatile storage (e.g., such as flash memory, ROM, etc.) or volatile storage such as a RAM (e.g., SRAM, DRAM, etc.), for one or more of the processing system 120, controller module 112, I/O module 114, and/or the system components and resources module 118. The memory module 116 may include a cache memory to provide temporary storage of information to enhance processing speed of the SoC 100. In some examples, the memory module 116 may be implemented as a universal flash storage (UFS) integrated into the SoC 100, or an external UFS card.
The SoC 100 may include a system components and resources module 118 for managing sensor data, analog-to-digital conversions, wireless data transmissions, and for performing other specialized operations (e.g., supporting interoperability between different devices). System components and resources module 118 may also include components such as voltage regulators, oscillators, phase-locked loops, peripheral bridges, data controllers, system controllers, access ports, timers, and other similar components used to support the processors and software clients running on the computing device. The system components and resources 118 may also include circuitry for interfacing with peripheral devices, such as cameras, electronic displays, wireless communication devices, external memory chips, etc.
Aspects of the present disclosure are directed to a hardware (HW) based solution for determining an instructions per cycle (IPC) for which a CPU is capable of executing a program, and scaling a clock of the CPU according to the calculated IPC to reduce power consumption. An example hardware implementation of the present disclosure is described in more detail below in reference to
In various aspects of the disclosure, the hardware configuration 200 may be part of any suitable system-on-a-chip (SoC) (e.g., SoC 100 of
In other examples, the hardware configuration 200 may be embodied by a wireless user equipment (UE). Examples of a UE include a cellular phone, a smart phone, a session initiation protocol (SIP) phone, a laptop, a notebook, a netbook, a smartbook, a personal digital assistant (PDA), a satellite radio, a global positioning system (GPS) device, a multimedia device, a video device, a digital audio player (e.g., MP3 player), a camera, a game console, an entertainment device, a vehicle component, a wearable computing device (e.g., a smart watch, a health or fitness tracker, etc.), an appliance, a sensor, a vending machine, or any other similar functioning device. The UE may also be referred to by those skilled in the art as a mobile station (MS), a subscriber station, a mobile unit, a subscriber unit, a wireless unit, a remote unit, a mobile device, a wireless device, a wireless communications device, a remote device, a mobile subscriber station, an access terminal (AT), a mobile terminal, a wireless terminal, a remote terminal, a handset, a terminal, a user agent, a mobile client, a client, or some other suitable terminology.
In the example of
In certain aspects, the APB driver 208 is a master with access to the counter values of any one or more registers 206a-206n from any one or more core processors 204a-204n. The APB driver 208 may also be configured to read the counter values of the one or more registers 206a-206n based on a “per-core” active core signal. The active core signal 212 indicates which core processor 204a-204n is active, and triggers the APB driver 208 to fetch one or more register values from the registers of the active core processor.
In some examples, the APB driver 208 is configured to receive a hysteresis signal from a hysteresis timer 216 of an instruction per cycle (IPC) calculator 214. In some examples, the hysteresis timer 216 is a counter that is incremented every clock cycle (e.g., a clock cycle of a core processor 204a-204n), thereby providing a hysteresis signal to the APB driver 208 every clock cycle. It should be noted that in some examples, the hysteresis timer 216 is configurable, and may be adjusted to scale the frequency of the hysteresis signal to any suitable frequency. For example, the hysteresis timer 216 may provide a clock signal having a frequency that depends on the counter reaching a certain count value corresponding to a count of a number of clock cycles of a core processor 204a-204n. The frequency of the hysteresis signal may be configured to define a programming window indicative of how frequently the register values of the one or more registers 206a-206n of a corresponding core processor 204a-204n are fetched by the APB driver 208. For example, the hysteresis timer 216 may provide a binary signal (e.g., core processor 204a-204n clock signal or scaled clock signal) to the APB driver 208, where the frequency of that signal indicates how frequently the register values of a register 206a-206n of a corresponding core processor 204a-204n are fetched by the APB driver 208. For example, when the APB driver 208 receives a “1” signal (e.g., a high clock signal) from the hysteresis timer 216, the APB driver 208 fetches a register value corresponding to the active core indicated by the active core signal 212. In other words, the hysteresis timer 216 may provide a “read-enable” signal to the APB driver 208 that controls how often the APB driver 208 fetches a set of instruction from a register, and therefore, how often the IPC calculator 214 and DCD calculator 222 scale the processor clock.
The processing system 202 and the APB driver 208 may be configured to communicate data bi-directionally via the APB 210 (e.g., the bus module 110 of
The IPC calculator 214 is configured to determine an “expected IPC” for executing a set of instructions based on the fetched register values. For example, the APB driver 208 may fetch a monitored power use and a clock cycle count from one or more of the first set of registers 206a, and provide the fetched values to the IPC calculator 214. The IPC calculator 214 receives the fetched values and calculates the expected IPC for executing one or more instructions of the set of instructions based on the fetched values. In some examples, the expected IPC can be determined based on vectors such as memory stall vectors and high activity vectors. For example, the core processor may include another set of registers which contain values indicating a number of gaps of time between executing instructions (e.g., memory stalls) or how many times an executed instruction has drawn an amount of power that is greater than a threshold.
The IPC calculator 214 can then utilize a comparator 220 configured to compare the expected IPC to a “threshold IPC” to determine whether the expected IPC is less than the threshold IPC. In some examples, the threshold IPC is a value stored in a register 218 of the IPC calculator, where the stored value is indicative of an average number of instructions executed by a core processor (e.g., the first core processor 204a) each clock cycle. By comparing the expected IPC with the threshold IPC, the IPC calculator 214 may determine if a system clock frequency of the active core processor (e.g., a clock frequency used by the active core processor to execute an instruction) should be scaled to reduce the amount of power used by the core processor in executing the instruction.
In certain aspects, the IPC calculator 214 selects a frequency scaling value, and outputs the scaling value to a clock generator, (e.g., a differential clock divider (DCD) calculator 222). This process is described in more detail below in reference to
The DCD calculator 222 may then scale the system clock frequency of the active core processor based on the selected scaling value. This process is described in more detail below in reference to
In certain aspects, the comparator 220 may receive the calculated expected IPC and the stored threshold IPC. In a step process, the comparator 220 may cycle iteratively through a graduated series of frequency scaling values (e.g., Freq_sel[0]-Freq_sel[7], where Freq_sel[0] provides a minimum scaling of the system clock frequency, and Freq_sel[7] provides a maximum scaling of the system clock frequency) based on whether a comparison of the expected IPC and the stored IPC satisfies an equality condition. For example, Freq_sel[0] may provide a minimum scaling by reducing, or “muting” relatively more clock pulses than other frequency scaling values. As such, the resultant clock is scaled to allow only a minimum number of clock pulses. In the example illustrated in
Initially, when comparing the expected IPC to the threshold IPC, if the comparator 220 determines that the expected IPC is less than the threshold IPC, then the comparator 220 selects Freq_sel[0], and the IPC calculator 214 outputs a signal (e.g., selected scaling value) indicating the Freq_sel[0] scaling value to the DCD calculator 222. The DCD calculator 222 then outputs a clock enable (CLK_EN) signal to the active core processor, wherein the CLK-EN signal is configured to scale the system clock frequency of the active core processor based on matrix values corresponding to Freq_sel[0]. For example, the system clock may only be enabled if the CLK_EN signal is high.
The active core processor (e.g., core processor 204a) then executes the instruction using the scaled system clock frequency, and updates one or more values in the set of registers 206a values accordingly. The next time the instruction is to be executed, the APB driver 208 fetches the updated register values and provides them to the IPC calculator 214. The IPC calculator 214 again calculates a new expected IPC based on the updated register values, and compares the new expected IPC to the stored threshold IPC. If the new expected IPC is still less than the threshold IPC, then the comparator 220 moves to the next scaling value, selecting Freq_sel[1], and the IPC calculator 214 outputs a signal indicating Freq_sel[1] to the DCD calculator 222. This step process continues until the comparator 220 determines that the expected IPC is greater than the threshold IPC, or until the last step (e.g., Freq_sel[7]) is reached.
Because Freq_sel[7] is the last stored scaling value, once it is reached, the next time the instruction is to be executed, the APB driver 208 fetches the updated register values and provides them to the IPC calculator 214. The IPC calculator 214 again calculates a new expected IPC based on the updated register values, and compares the new expected IPC to the stored threshold IPC. If the new expected IPC is still less than the threshold IPC, then the comparator 220 will reuse Freq_sel [7], and the IPC calculator 214 will output a signal indicating Freq_sel[7] to the DCD calculator 222.
If the IPC calculator 214 determines that the expected IPC is greater than the threshold IPC, then the IPC calculator 214 will perform a DCD bypass procedure, and the core processor will execute the instruction at a full clock speed (e.g., no clock scaling).
The matrix 402 of the DCD calculator 222 includes a configurable number of rows and columns, wherein each row of the matrix 402 corresponds to a clock frequency scaling value (e.g., Sel[0]-Sel[7]). It should be noted that the number of rows and columns of the matrix 402 illustrated in
In one example, a first instruction may only require only 4 clock cycles to execute, but the expected IPC may indicate that the first instruction requires 6 clock cycles. In this example, the comparator 220 may compare the expected IPC to a stored threshold IPC, cycling through frequency scaling values until the expected IPC is greater than the threshold IPC or until the last scaling value is reached.
As shown in the
Initially, at a first step 602, an APB driver (e.g., APB driver 208 of
At a second step 604, the APB driver 208 may retrieve a first value from a first register of the core processor 204a and a second value from a second register of the core processor 204a, wherein the first value and the second value correspond to a set of instructions to be executed by the core processor 204a. The first value and the second value may provide any suitable information about the core processor and the set of instructions. In one example, the first value is indicative of a clock cycle count (e.g., a number of clock cycles between a first instruction and a second instruction of the set of instructions during which the core processor is idle, or a number of system clock cycles over a duration of time), and the second value is indicative of a number of instructions executed during the clock cycle count, or indicative of power consumed by the corresponding core processor during execution of an instruction. In certain aspects, the APB driver 208 may pass the retrieved values to an IPC calculator (e.g., IPC calculator 214 of
At a third step 606, the IPC calculator 214 calculates an expected IPC for executing an instruction, where the expected IPC is calculated using the first value and the second value retrieved from the first register and the second register of the core processor 204a.
At a fourth step 608, the IPC calculator 214 compares a stored threshold IPC to the expected IPC to determine whether an equality condition is met, wherein the threshold IPC is stored in a third register of the IPC calculator. In some examples, the equality condition is not met if the expected IPC is less than the threshold IPC.
At a fifth step 610, the IPC calculator 214 may determine that the threshold condition is not met (e.g., the expected IPC is less than the threshold IPC).
If the threshold condition is not met, then the operations may proceed to a sixth step 612, where the IPC calculator 214 determines an initial scaling value of a plurality of scaling values. In this example, each of the plurality of scaling values may correspond to an M value indicative of an index of each respective scaling value. In this example, the initial scaling value is indexed as 1, and thus, M is initially equal to 1. In some examples, the scaling values are indexed such that the initial scaling value (e.g., M=1) is configured to scale the system clock cycle signal by fewer clock cycles relative to another scaling value (e.g., M>1). The IPC calculator 214 may signal the initial scaling value to a DCD calculator (e.g., DCD calculator 222 of
At a seventh step 614, the DCD calculator 222 may scale the system clock cycle signal of the core processor 204a according to the initial scaling value. For example, the DCD calculator 222 may output a CLK_EN signal to the core processor 204a, wherein the CLK_EN signal is configured to scale the system clock cycle signal according to the initial scaling value.
At an eighth step 616, the core processor 204a updates its register counter values based on the execution of the instruction. For example, the core processor 204a may update the first register with a third value, and update the second register with a fourth value.
At a ninth step 618, the IPC calculator 214 increments M.
If the IPC calculator 214 determines that the threshold condition is met (e.g., expected IPC calculator 214 determines that the expected IPC is greater than or equal to the threshold IPC) at the fifth step 610, then the operations may proceed to a tenth step 620, where the core processor 204a executes the set of instructions using the system clock cycle signal (e.g., the system clock cycle signal is not scaled).
The operations 700 may begin, at block 702, by retrieving, by an advanced peripheral bus (APB) driver, a first one or more values from one or more registers of a core processor, the first one or more values corresponding to a set of instructions of the core processor.
The operations 700 proceed to block 704 by determining, by an IPC calculator, a first expected instruction per cycle (IPC) for executing the set of instructions based on the first one or more values.
The operations 700 proceed to block 706 by comparing, by the IPC calculator, a threshold IPC to the first expected IPC to determine whether an equality condition is met, wherein the threshold IPC is stored in a first register of the IPC calculator.
The operations 700 may proceed to block 708, wherein if the equality condition is not met, the operations 700 are configured for determining, by the IPC calculator, a first scaling value, scaling, by a clock generator, a clock signal of the core processor according to the first scaling value, executing, by the core processor, the set of instructions using the clock signal scaled according to the first scaling value, and updating, by the core processor, the first one or more values of the one or more registers with a second one or more values.
The operations 700 may proceed to block 710, wherein if the equality condition is met, the operations 700 are configured for executing, by the core processor, the set of instructions using the clock signal of the core processor.
In certain aspects, operations 700 may include retrieving, by the APB driver, the second one or more values from the one or more registers. The operations 700 may also include determining, by the IPC calculator, a second expected IPC for executing the set of instructions based on the second one or more values. The operations 700 may also include comparing, by the IPC calculator, the threshold IPC to the second expected IPC to determine whether the equality condition is met.
If the equality condition is not met, the operations 700 may also include determining, by the IPC calculator, a second scaling value. The operations 700 may also include scaling, the clock generator, the clock signal of the core processor according to the second scaling value. The operations 700 may also include executing, by the core processor, the set of instructions using the clock signal scaled according to the second scaling value. The operations 700 may also include updating, by the core processor, the second one or more values of the one or more registers with a third one or more values.
If the equality condition is met, executing, by the core processor, the set of instructions using the clock signal of the core processor.
In certain aspects, the operations 700 include iteratively selecting the first scaling value and the second scaling value from a set of stored scaling values, wherein each of the stored scaling values in the set of scaling values correspond to an entry in a stored matrix.
In certain aspects, the operations 700 include retrieving, by the APB driver, the third one or more values from the one or more registers. The operations 700 may also include determining, by the IPC calculator, that the second scaling value is the last scaling value of the set of stored scaling values. The operations 700 may also include determining, by the IPC calculator, to reuse the second expected IPC for executing the set of instructions based on the third one or more values and the determination that the second scaling value is the last scaling value.
In certain aspects, the stored matrix comprises a plurality of rows and a plurality of columns, wherein each of the plurality of rows corresponds to one of the set of stored scaling values, and wherein each of the plurality of columns corresponds to one clock cycle of a contiguous series of clock cycles of the clock signal of the core processor.
In certain aspects, each of the plurality of columns is configured to indicate one of an expression or a suppression for each clock cycle of the contiguous series of clock cycles.
In certain aspects, the one or more values comprise a first value and a second value, the first value indicative of a count of a number of clock cycles over a period of time, the second value indicative of a count of instructions executed over the period of time, and determining the first expected IPC further comprises dividing the second value by the first value.
In certain aspects, the one or more registers comprise a first register and a second register, the first register and the second register comprise an active monitor unit (AMU) register or a performance monitor unit (PMU) register, the AMU is configured to gather power data associated with the core processor, and the PMU is configured to gather one or more of operational data or memory data associated with the core processor.
In certain aspects, each of the first value and the second value are indicative of at least a memory stall count or an activity count, the memory stall count is indicative of a number of clock cycles between a first instruction and a second instruction of the set of instructions during which the core processor is idle, and the activity count is indicative of power consumed by the core processor during execution of the first instruction.
In certain aspects, the operations 700 may also include receiving, by the APB driver, an active signal from the core processor indicating that the core processor is active, wherein retrieving the one or more values further comprises retrieving a first value from the first register and a second value from the second register in response to receiving the active signal.
In certain aspects, the first expected IPC is determined by dividing the first value by the second value.
In certain aspects, the clock generator is a differential clock divider (DCD) calculator comprising a separate configurable matrix for each core processor of a plurality of core processors.
In some configurations, the term(s) ‘communicate,’ ‘communicating,’ and/or ‘communication’ may refer to ‘receive,’ ‘receiving,’ ‘reception,’ and/or other related or suitable aspects without necessarily deviating from the scope of the present disclosure. In some configurations, the term(s) ‘communicate,’ ‘communicating,’ ‘communication,’ may refer to ‘transmit,’ ‘transmitting,’ ‘transmission,’ and/or other related or suitable aspects without necessarily deviating from the scope of the present disclosure.
Within the present disclosure, the word “exemplary” is used to mean “serving as an example, instance, or illustration.” Any implementation or aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects of the disclosure. Likewise, the term “aspects” does not require that all aspects of the disclosure include the discussed feature, advantage or mode of operation. The term “coupled” is used herein to refer to the direct or indirect coupling between two objects. For example, if object A physically touches object B, and object B touches object C, then objects A and C may still be considered coupled to one another—even if they do not directly physically touch each other. For instance, a first object may be coupled to a second object even though the first object is never directly physically in contact with the second object. The terms “circuit” and “circuitry” are used broadly, and intended to include both hardware implementations of electrical devices and conductors that, when connected and configured, enable the performance of the functions described in the present disclosure, without limitation as to the type of electronic circuits.
One or more of the components, steps, features and/or functions illustrated herein may be rearranged and/or combined into a single component, step, feature or function or embodied in several components, steps, or functions. Additional elements, components, steps, and/or functions may also be added without departing from novel features disclosed herein. The apparatus, devices, and/or components illustrated herein may be configured to perform one or more of the methods, features, or steps described herein. The novel algorithms described herein may also be efficiently implemented in software and/or embedded in hardware.
It is to be understood that the specific order or hierarchy of steps in the methods disclosed is an illustration of exemplary processes. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the methods may be rearranged. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented unless specifically recited therein.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. A phrase referring to “at least one of” a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover: a; b; c; a and b; a and c; b and c; and a, b and c. All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. No claim element is to be construed under the provisions of 35 U.S.C. § 112(f) unless the element is expressly recited using the phrase “means for” or, in the case of a method claim, the element is recited using the phrase “step for” or simply as a “block” illustrated in a figure.
These apparatus and methods described in the detailed description and illustrated in the accompanying drawings by various blocks, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). These elements may be implemented using hardware, software, or combinations thereof. Whether such elements are implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system.
By way of example, an element, or any portion of an element, or any combination of elements may be implemented with a “processing system” that includes one or more processors. Examples of processors include microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. One or more processors in the processing system may execute software. Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, firmware, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. The software may be stored on non-transitory computer-readable medium included in the processing system.
Accordingly, in one or more exemplary embodiments, the functions described may be implemented in hardware, software, or combinations thereof If implemented in software, the functions may be stored on or encoded as one or more instructions or code on a computer-readable medium. Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, PCM (phase change memory), flash memory, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.