One consequence of increasing microprocessor performance is the increased amount of power needed to operate these improved and more powerful microprocessors. Certain systems include an operating system software approach that controls the processor to operate at different power levels depending on the requirements of the application being executed. Certain microprocessors also allow the voltage to be adjusted. The goal of such programs that adjust voltage is to reduce the performance of the processor without causing an application to miss deadlines. Further, completing a task before a deadline and then idling is less energy efficient than running the task at a slower speed in order to meet the deadline exactly.
In the following description, reference is made to the accompanying drawings which form a part hereof and which illustrate several embodiments. It is understood that other embodiments may be utilized and structural and operational changes may be made without departing from the scope of the embodiments.
The system 2 may comprise computational devices known in the art. The memory 6 may comprise a volatile memory device in which programs and instructions are loaded to execute. The processors 4a, 4b . . . 4n may comprise separate processors each on a separate integrated circuit die. In an alternative embodiment, the processors 4a, 4b . . . 4n may comprise cores on a single integrated circuit die, such as a multi-core processor. In one embodiment, the processor optimizer 12 may independently control each of the processors' 4a, 4b . . . 4n voltage and frequency settings, such that different voltage levels may be applied to different of the processors 4a, 4b . . . 4n.
In one embodiment, the processor optimizer 12 may perform the operations at blocks 104-108 to determine the scaling factor. At blocks 104 and 106, the processor optimizer 12 measures a first time for a first number of processors to execute the task and a second time for a second number of processors to execute the task. Thus, in one embodiment, the task is executed while doing the testing for the optimal number of processors. The scaling factor is determined (at block 108) as a function of the first and second times (e.g., dividing the first time by the second time to produce a ratio and then subtracting the ratio by one). Equation (1) provides one embodiment for calculating the scaling factor (s) where the first time comprises t1 and the second time comprises t2.
In one embodiment, the first and second number of processors may comprise consecutive numbers, such as two and three or three and four processors. As discussed, the first and second times for the scaling factor may be calculated while executing the task as part of an initial determination of the optimal number of processors 4a, 4b . . . 4n or as part of a dynamic adjustment of the number of processors to use during task execution. Alternatively, the task executed by the different number of processors 4a, 4b . . . 4n may comprise a test task specialized code that is used for calculating the scaling factor. In one alternative embodiment,
In one embodiment, the processor optimizer 12 maintains an optimal processor number table 14 including entries where each entry provides a range of scaling factor values and a corresponding number of processors for the range of scaling factors. In one embodiment, each entry provides a number of processors that minimizes an energy delay for the range of scaling factor values associated with the entry. The energy delay (Q) may be calculated by calculating the performance (trun) time to execute the process and power expended (Ptot) using the additional processor to execute the task. The energy delay (Q) comprises the amount of energy expended over the runtime, i.e., the total cost of the computation.
The performance time (trun) to execute the task may be calculated using the scaling factor (s) and the operating frequency (f) of the processors 4a, 4b, 4n as shown below in equation (2).
An amount of power consumed (Ptot) to execute the task 10 with the number of processors (n) may be calculated using the operating frequency (f), an operating voltage (Vdd) supplied to the processors 4a, 4b . . . 4n, a processor-type specific static energy constant (ktech) indicating energy leakage for the processor 4a, 4b . . . 4n, and the number of processors (n) as shown in equation (3) below.
Equations (2) and (3) can be modified and modeled depending on the design of the processor, such that the scaling factor and power consumed to execute the task is dependent on the design of the processors. For instance, equations (2) and (3) are calculated based on the number of processors (n). In alternative embodiments, these equations may be calculated as some function of the number of processors (n), e.g., n multiplied or divided by some value or some other function (linear or non-linear) of n. For instance, in equation (3), the power consumed (Ptot) increases linearly as the number of processors (n) increases, e.g., two processors use twice as much power as a single processor. However, for multiple processors/cores implemented on a single integrated circuit die, increasing processors may not linearly increase the amount of power consumed (Ptot) because the multiple-cores may share certain resources. In such case, some fraction or other function of the number of processors (n) may be used, e.g., n/k, where k is constant. Thus, adjusting the number of processors (n) in equations (2) and (3) controls how the scaling factor and consumed power are calculated as the number of processors increases.
The total energy expended (Etot) with the number of processors (n) may be calculated by multiplying the performance time (trun) times the power expended (Ptot) as shown in equation (4) below.
The energy delay (Q) comprises the product of the total energy to execute the task 10 (Etot) and the performance time (trun) to execute the task 10, which comprises the amount of energy expended over the runtime, i.e., the total cost of the computation. The energy delay (Q) may be calculated according to equation (5) below:
The number of processors (n) selected to minimize the energy delay (Q) may be solved by computing a derivative of the energy delay (Q) with respect to the number of processors (n) to produce a value of zero. Equation (6) below shows the derivative to determine the number of processors (n) to minimize the energy delay (Q).
The developer of the optimal processor number table 14 may then solve the above differential equation to determine different numbers of processors (n) for different ranges of scaling factors, where each entry in the table indicates a range of scaling factor values and the corresponding optimal number of processors (n) for a scaling factor falling in that range to minimize the energy delay, or total energy consumption over the execution time.
The processor optimizer 12 uses (at block 112) the determined scaling factor to determine a number of processors to assign to execute a task. In one embodiment where the optimal processor number table 14 is maintained, the processor optimizer 12 may perform the operations at blocks 114 and 116 to determine the optimal number of processors to use to process the task. At block 114, the processor optimizer 12 determines an entry in the table 14 having a range of scaling factors including the determined scaling factor and determines (at block 116) the number of processors indicated in the determined entry. The processor optimizer 12 then causes the system 2 to supply (at block 118) an operational supply voltage to each of the determined number of processors to execute the task and supply a low power mode voltage to processors not supplied the operational supply voltage. In one embodiment, the processor optimizer 12 may cause voltage to be supplied independently to the processors 4a, 4b . . . 4n, so that some processors may be supplied the operating voltage and others a lower power mode voltage. The determined number of processors 4a, 4b . . . 4n supplied the operating voltage execute (at block 120) the task 12.
In one embodiment, the processor optimizer 12 may not maintain the optimal processor number table 14 and instead calculate the optimal number of processors by solving the differential equation (6).
The operations of
Described embodiments provide techniques to determine an optimal number of processors to use to execute a task taking into account the parallelism of the code of the task to execute, i.e., scaling factor, the performance time to execute the task based on the scaling factor, and the energy expended to execute the task with the optimal number of processors.
The described embodiments may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term “article of manufacture” as used herein refers to code or logic implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.) or a computer readable medium, such as magnetic storage medium (e.g., hard disk drives, floppy disks,, tape, etc.), optical storage (CD-ROMs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, firmware, programmable logic, etc.). Code in the computer readable medium is accessed and executed by a processor. The code in which preferred embodiments are implemented may further be accessible through a transmission media or from a file server over a network. In such cases, the article of manufacture in which the code is implemented may comprise a transmission media, such as a network transmission line, wireless transmission media, signals propagating through space, radio waves, infrared signals, etc. Thus, the “article of manufacture” may comprise the medium in which the code is embodied. Additionally, the “article of manufacture” may comprise a combination of hardware and software components in which the code is embodied, processed, and executed. Of course, those skilled in the art will recognize that many modifications may be made to this configuration without departing from the scope of the embodiments, and that the article of manufacture may comprise any information bearing medium known in the art.
The described operations may be performed by circuitry, where “circuitry” refers to either hardware or software or a combination thereof. The circuitry for performing the operations of the described embodiments may comprise a hardware device, such as an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc. The circuitry may also comprise a processor component, such as an integrated circuit, and code in a computer readable medium, such as memory, wherein the code is executed by the processor to perform the operations of the described embodiments.
The illustrated operations of
The above described equations for calculating performance time (equation (2)), time, power consumed (equation (3)), and energy delay (equation (5)) may include additional variables, such as frequency.
The foregoing description of various embodiments has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the embodiments to the precise form disclosed. Many modifications and variations are possible in light of the above teaching.