The operation of a processor is constrained by physical limitations of the materials used to form the processor. Exceeding these physical limitations can cause unpredictable or undesirable operation of the processor's circuits, can damage the circuits themselves, or can shorten the useful life of the processor. Accordingly, to ensure that the physical limitations are not exceeded, a typical processor is configured to operate within a specified current limit. In particular, the processor sets the frequency of the processor clock, the processor reference voltages, and other parameters so that the current generated by one or more modules of the processor is not expected to exceed the current limit for more than a threshold amount of time. However, because the processor is likely to experience variable and unpredictable workloads, and therefore consume variable and unpredictable levels of current, a typical processor is designed to operate well below the current limit. While providing this operating margin ensures that temporary current excursions from heavy workloads do not exceed the current limit, such a large operating margin places an undesirable limit on overall processor performance.
The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.
Each of the processor cores 101-104 includes at least one instruction pipeline to execute program instruction threads. In particular, each instruction pipeline is configured to fetch instructions, decode each fetched instruction into one or more operations, dispatch the operations to one or more execution units, and retire instructions whose operations have completed execution. For purposes of description, the activities a processor core performs to execute instructions are generally referred to as tasks, and the number of tasks being performed by the processor core are referred to herein as the activity level of the processor core. As will be appreciated by one skilled in the art, the activity level of a processor core varies based on the characteristics of the set of instructions being executed by the processor core. For example, some sets of instructions require the processor core to perform a relatively high number of tasks in a short amount of time, such as requiring some combination of a high number of mathematical calculations, accessing memory structures, communicating with input/output devices, and the like. For these sets of instructions, the activity level of the processor core will be relatively high. In contrast, some sets of instructions require a relatively small number of tasks to be executed by the processor core, and therefore the activity level of the processor core will be relatively low.
To control the rate at which the processor cores 101-104 perform tasks, each of the processor cores 101-104 is supplied with a corresponding clock signal, designated CLK1, CLK2, CLK3, and CLK4, respectively. Each of the processor cores 101-104 employs its corresponding clock signal to synchronize operations of its synchronous logic elements. The frequencies of the clock signals CLK1-CLK4 thus govern the rate at which the processor cores 101-104 perform their corresponding tasks. The amount of current generated by a processor core, referred to herein as the processor core's activity current, is a function of the activity level of the processor core (the number of tasks the processor core is performing) and the frequency of the processor core's clock signal (the rate at which the processor core can execute the tasks). As explained further below, the processor 100 is generally configured to monitor a combined activity current for the processor cores 101-104 and, in response to the combined activity current exceeding a threshold, to throttle one or more of the processor cores 101-104 by lowering the frequency of one or more of the clock signals CLK1-CLK4. The processor 100 thereby ensures that it operates within specified current limits.
In the example embodiment of
To monitor the activity currents at the processor cores 101-104, the processor 100 employs a set of four performance monitors, designated performance monitor 111, performance monitor 112, performance monitor 113, and performance monitor 114, collectively referred to as performance monitors 111-115. Each of the performance monitors 111-115 is configured to identify the activity current at a corresponding one of the processor cores 101-104 and the shared cache 105. In the depicted embodiment, the performance monitor 111 identifies the activity current at processor core 101, the performance monitor 112 identifies the activity current at processor core 102, the performance monitor 113 identifies the activity current at processor core 103, the performance monitor 114 identifies the activity current at processor core 104, and the performance monitor 115 identifies the activity current at the shared cache 105.
In some embodiments, the performance monitors 111-115 identify the corresponding activity current indirectly by identifying the occurrence of specified events, and then calculating the activity current based on a known or predicted relationship between the identified events and an activity current value or values corresponding to the identified events. Examples of such events include cache accesses, cache misses, fetching of specified types of instructions, dispatch of specified types of operations, and the like. In other embodiments, the performance monitors 111-115 employ one or more current sensors to directly measure activity current at the corresponding processor core. In still other embodiments, the performance monitors 111-115 identify the activity current at the corresponding processor core based on a combination of current sensors and identified events.
To throttle the processor cores 101-104 based on the activity currents, the processor 100 employs a combined activity current monitor (CCM) 120 and a power and clock control module (PM) 125. The CCM 120 is generally configured to identify a combined activity current value based on the activity currents identified by the performance monitors 111-114. To illustrate, each of the performance monitors 111-115 generates a signal, designated AC1, AC2, AC3, AC4, and AC5 (collectively referred to as AC1-AC5), respectively, indicating the activity current at the corresponding processor core. The CCM 120 receives the signals AC1-AC5 and combines the indicated activity currents to determine the CCV. For example, in some embodiments the CCM 120 determines the CCV by averaging the indicated activity currents. The CCM 120 compares the CCV to an activity current threshold (ACT), wherein the ACT is based on the specified current limits for the processor 100. Based on the comparison, the CCM 120 sets the state of a control signal designated STRETCH, that controls the clock frequency of one or more of the clock signals CLK1, CLK2, CLK3, and CLK4, as described further herein. In particular, in response to the CCV exceeding the ACT, the CCM 120 asserts the signal STRETCH to reduce the frequency of one or more of CLK1, CLK2, CLK3, and CLK4, thereby reducing the activity current at one or more of the processors 101-104 and ensuring that the processor 100 does not exceed the specified current limits.
The PM 125 is generally configured to control the power states for the processor cores 101-104 based on the state of the signal STRETCH. As used herein, a power state of a processor core refers to a specified state of the processor core wherein one or more parameters of the processor core, such as a clock frequency, reference voltage, and the like are set such that the processor core consumes less than a specified amount of power relative to other power states. Examples of power states include an active state, wherein a processor core executes operations at a normal rate, and a low-power state, wherein the processor core executes operations at a lower rate, or does not execute operations but maintains stored data. In the example of
As noted above, the PM 125 sets the power state for each of the processor cores 101-104 based on the state of the signal STRETCH. Thus, if the SIGNAL stretch is in a negated state, indicating that the CCV is below the ACT, the PM 125 maintains the clock signals CLK1, CLK2, CLK3, and CLK4 at a relatively high frequency. In response to assertion of the signal STRETCH, the PM 125 reduces the frequency of at least one of the clock signals CLK1, CLK2, CLK3, and CLK4, thereby reducing the activity level at the corresponding processor cores. In some embodiments, the PM 125 changes the power state of each of the processor cores 101-104 equally, by changing the frequency of each of the clock signals CLK1, CLK2, CLK3, and CLK4 by the same amount (e.g. reducing the frequency of each clock signal by the same percentage).
In other embodiments, the PM 125 selects a subset of the processor cores 101-104 and changes the power state only of the processor cores in the selected subset. The PM 125 selects the subset of processor cores based on any of a variety of criteria, such as the individual activity currents at the processor cores 101-104, priority values associated with threads executing at the processor cores 101-104, and the like. In some embodiments, the PM 125 is not a single module, but instead each of the processor cores 101-104 is associated with a different PM, and the CCM 120 provides an individual stretch control signal to each of the different PMs. The CCM 120 thereby individually changes the frequencies of each of the clock signals CLK1, CLK2, CLK3, and CLK4 according to the corresponding processor core's contribution to the combined activity current. Thus, for example, if the processor core 102 contributes more to combined activity current than processor core 101, the CCM 120 reduces the frequency of CLK2 by a greater amount than the frequency of CLK1.
It will be appreciated that the techniques described herein can be applied to processor configurations other than the embodiment of
The averaging module 232 is configured to receive the accumulated activity value and divide the accumulated activity value by the number of the processor cores 101-104 and shared cache 105 that are active, thereby generating an average activity value. Thus, in some scenarios, such as when one or more of the processor cores 101-104 is idle or in sleep state, the divisor applied by the averaging module 232 is fewer than the total number of processor cores 101-104 and the shared cache 105.
The threshold register 234 is configured to store a threshold value indicative of a threshold activity current level. In some embodiments, the threshold value is set by the processor 100 to indicate a specified maximum activity current level, such that if the average activity current for the processor 100 were to exceed the maximum activity current for a threshold amount of time, the processor 100 would be damaged or experience unpredictable operations.
The compare module 236 is configured to compare the average activity value generated by the averaging module 232 to the threshold value stored at the threshold register 234. When the average activity value is less than the threshold value, the compare module 236 maintains the signal STRETCH in a negated state, thereby preventing throttling of the processor cores 101-104 due to activity level. In response to the average activity value exceeding the threshold, the compare module 236 asserts the signal STRETCH, thereby throttling one or more of the processor cores 101-104 and reducing the activity current at the processor 100.
Curve 341 represents the assumed activity current if the activity current were identified at only one of the processor cores 101-104. That is, curve 341 represents a “single source” configuration under which the processor 100 is not directly identifying the combined activity current at all of the processor cores 101-104 and the shared cache 105. Under such a single source configuration, to prevent the combined activity current from exceeding the threshold, the processor 100 assumes that each of the processors 101-104 and the shared cache 105 generates activity current at least equal to that of the activity current measured at the single source. As illustrated by curve 341, this assumption results in higher activity current values than the average activity current value depicted by curve 340. Thus, under the single source configuration the processor 100 is likely to throttle the processor cores 101-104 more often than under the configuration where the processor 100 employs a combined activity current. For example, at time 345 the activity currents at the processor cores 101-104 and the shared cache 105 are such that, under the single source configuration, the assumed activity current is greater than the threshold 342, which would require the processor 100 to throttle at least one of the processor cores 101-104. However, the average activity current value at time 345 is less than the threshold 342, so the processor 100 does not throttle any of the processor cores 101-104. Thus, throttling the processor cores 101-104 based on combined activity currents rather than on a single source is likely to reduce overall throttling. In addition, throttling based on combined activity currents allows each of the processor cores 101-104 to operate at a point closer to the threshold 342, improving processor performance.
At block 408 the compare module 236 determines whether the average activity current value is greater than the threshold value stored at the threshold register 234. If not, the method flow moves to block 410 and the compare module 236 maintains the STRETCH signal in a negated state. The negated state of the STRETCH signal in turn causes the PM 125 (
Returning to block 408, if the compare module 236 determines that the average activity current value is greater than the threshold value, the method flow moves to block 412 and the compare module 236 asserts the STRETCH signal. In response, the PM 125 reduces the clock frequency of one or more of the clock signals CK1, CK2, CK3, and CK4, thereby changing the power state of the corresponding ones of the processor cores 101-104.
Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed. Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims. Moreover, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope of the disclosed subject matter. Accordingly, the protection sought herein is as set forth in the claims below.