An embodiment of the present invention relates to the field of electronic systems and, more particularly, to an approach for integrated circuit performance improvement and/or power reduction.
In the field of integrated circuit design, particularly for very large scale integration (VLSI) designs, performance and power consumption are typically key focus areas. More specifically, it is generally desirable to design an integrated circuit with a goal of achieving high performance and low power consumption where possible.
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements, and in which:
A method and apparatus for correlated logic micro-caching are described. In the following description, particular components, circuits, cache memory architectures, etc. are described for purposes of illustration. It will be appreciated, however, that other embodiments are applicable to other types of components, circuits, and/or cache memories, for example.
References to “one embodiment,” “an embodiment,” “example embodiment,” “various embodiments,” etc., indicate that the embodiment(s) of the invention so described may include a particular feature, structure, or characteristic, but not every embodiment necessarily includes the particular feature, structure, or characteristic. Further, repeated use of the phrase “in one embodiment” does not necessarily refer to the same embodiment, although it may.
Aspects of embodiments of the invention may be described for purposes of illustration as being implemented in one of hardware, firmware or software. It will be appreciated that such aspects may instead be implemented in a different medium.
One way to improve the performance of an integrated circuit while reducing the power consumption of that circuit is to avoid computing signal values that have already been computed. For example, currently, instruction caches, data caches, translation look-aside buffers, etc. are used to benefit from expected localities of time and space with respect to architectural objects such as memory addresses, operand values, predicted branches, etc.
Frequently, the behaviors of apparently architecturally unrelated circuit functions in complex integrated circuits may also be correlated. Such correlations may exist in addition to those that would be expected when viewing a circuit only with respect to its architecture, and may also benefit from correlated logic micro-caching as described in more detail below. For example, a circuit simulation of a given Register Transfer Language (RTL) model driven by given input traces, may expose correlations between “input” and “outputs” that cannot or cannot easily be described in terms of circuit architectural abstractions. The result may simply indicate that activity at one point predicts activity at another point.
More specifically, for one embodiment, logic is provided to at least initiate computation of an output value to be applied at an output node in response to receiving an input value at an input node. A correlated logic micro cache is also provided to store information sufficient to recover a first input value associated with the input node and a first output value associated with the first input value, the first output value to be applied at the output node and computation by the logic to be halted if a second input value to be applied at the input node matches the first input value.
Further details of this and other embodiments are provided in the following description.
While three example correlated logic micro-caches 105A-105C and associated logic 110A-110C are shown in
The example correlated logic micro-cache 205 also includes comparator(s) 220 that provide the capability to compare an input value X received at an input of associated logic 210 with each of the input tags of the micro cache 215 and to indicate whether the comparison is a “hit” or a “miss.” Output(s) of the comparator(s) are coupled to a select input of a multiplexer (mux) 222 and a clock disable signal line 223 coupled to a local clock 224 used to clock the associated logic 210. Additionally, the correlated logic micro-cache 205 may include logic 225 to convert an input value into a tag and logic 230 to copy an output signal to a data line of the correlated logic micro cache memory 215. Additional circuitry and/or logic may be provided for various embodiments.
For one embodiment, the cache tags P-R are capable of storing one or more input values for the associated logic 210. The corresponding cache lines P-R are capable of storing output value(s) that are provided at an output of the associated logic 210 in response to the corresponding input value(s).
For another embodiment, one or more of the tag and/or the cache lines may instead store information sufficient to recover an input value and/or associated output value. For example, an input and/or output value may be compressed, encoded, encrypted or otherwise represented in the tag and/or the cache line, and then recovered using additional circuitry (not shown).
Further, the correlated logic micro cache memory 215 may include additional storage for some embodiments to track most and/or least recently used information, indicate the validity data, and/or provide other information related to a particular tag and/or cache line.
The associated logic 210 includes one or more stages of combinational logic. For the example shown in
In operation, an input value or set of input values X may be received at an input node 237 and provided both to comparator(s) 220 and to associated logic 210. Computation of an output value by associated logic 210 may then proceed concurrently with a comparison of the input value X and input value(s) associated with information stored in input tags P-R.
If the input value(s) X is identified as a hit when compared with input tags P-R, then a tag match mux selector signal may be asserted on the signal line 250, and a clock disable signal may be asserted on signal line 223. The associated output value from the corresponding cache line may be provided at the input to the mux 222 over the signal line 235 if it is not already being provided as described in more detail below. Assertion of the tag match signal causes the mux 222 to selectively provide the cached output from the associated cache line at an output node 252. Further, assertion of the clock disable signal on the signal line 223 causes the local clock 224 to be disabled and computation by associated logic 210 to be discontinued. In this manner, power consumption and/or computation time associated with full computation of an output value in response to input value(s) X may be reduced.
If the input value X is instead determined to be a miss when compared to input values stored in the correlated logic micro cache memory 215, the tag match mux selection signal on the signal line 250 is deasserted and the output of associated logic 210 is instead selectively provided to the output node 252.
Any one of a variety of different approaches may be used to design the correlated logic micro cache 205 and associated logic 210 and/or downstream logic (not shown) of various embodiments such that a valid output value is provided at the output node 252 regardless of whether a computed or cached output value is used. For one embodiment, for example, the correlated logic micro cache 205 and associated logic 210 may be designed such that the delay associated with providing a cached output value is substantially the same as the delay associated with providing a computed output value. This may be done, for example, by padding the path for the cached output value with additional clock cycle(s) as needed. For such embodiments, while there may not be performance benefits associated with using the correlated logic micro cache, there may be a power savings as a result of halting computation upon detecting a correlated logic micro cache hit.
For other embodiments, for example, downstream (or dependent) circuits (not shown) may be designed such that they are ready for the output value at the node 252, whether via a fast correlated logic micro cache hit or via a slower cache miss (computation). For such embodiments, a “valid” bit (not shown) or other similar approach may be used to indicate valid output data.
For some embodiments, in the case of a miss, the new input value(s) is stored as a new or replacement tag and the associated output value(s) calculated by the associated logic may be stored as a new/replacement cache line entry in the correlated logic micro cache memory 215. If the correlated logic micro cache memory 215 is determined to be full, one of the tag/cache line pairs (or the only tag/cache line pair in the case of a single entry memory) may be replaced with the new input value/output value pair. Where multiple tag/cache line pairs are provided and most/least recently used information is tracked, the least recently used tag/cache line may be replaced. A different approach may be used for other embodiments to determine which cache line to replace where the correlated logic micro cache memory includes multiple cache lines.
For one embodiment, for subsequent operation, the correlated logic micro cache 205 may speculatively predict that the next input value or set of input values will be the same as the previous input value(s). In this case, the previous output value may be speculatively provided at the correlated output and the correlated output is replaced or computation proceeds only in the event that the input value changes or is not found in the correlated logic micro cache memory, respectively.
Further, for some embodiments, the tag match mux selection signal may be a “sticky” signal, i.e. it may remain asserted until a miss is detected, thereby continuously selecting the same output value(s) until the input value(s) change.
For some embodiments, the correlated logic micro cache memory 215 may be invalidated or set to an impossible value upon power up or reset of the integrated circuit including the correlated logic micro cache memory 215. For other embodiments, the correlated logic micro cache memory 215 may be in an indeterminate state upon power up of the integrated circuit chip 210 and it is loaded with valid data during operation as described above. For still other embodiments, predetermined input/output value(s) may be loaded into correlated logic micro cache memory 215 from a separate memory upon power-up or reset of a host integrated circuit chip, for example.
To determine where placement and use of a correlated logic micro cache according to one or more embodiments may be particularly advantageous, a variety of different approaches may be used. For one embodiment, for example, applications may be run on a Register Transfer Language (RTL) model of an integrated circuit while nodes are evaluated for correlations between given inputs and outputs that may not otherwise have been identified. This type of evaluation may be performed, for example, concurrently with toggle coverage evaluation. Such correlations may be identified in the behaviors of small parts of large circuits, for example. For some embodiments, different types of applications, e.g. graphics applications versus word-processing or other types of applications, may be run to identify different input(s)/output(s) correlations in different operating environments and/or during the use of different types of applications.
Once such correlations are identified, it can be determined whether the correlation logic micro cache of one or more embodiments may be beneficial in association with the identified correlations, and if so, the number of tag/cache line pairs that is likely to be most advantageous. This evaluation may depend on a variety of factors such as, for example, the most common input/output values for the selected collection of input/output nodes, the size, complexity and/or typical delay through the associated logic, the expected performance improvement/power saving associated with the correlated logic micro cache and/or other factors. If common input/output value combinations are identified, these may be used to pre-load the correlated logic micro cache as described above.
Referring now to
In contrast to the embodiment(s) of
As shown in
It will be appreciated that, while the use of four fuses is described above, various embodiments may provide for selective enablement of one or more correlated logic micro caches using a different approach and/or a different number and/or location of fuses or other elements that provide for selective connections after fabrication.
For one embodiment, the system 500 includes one or more single or multi-core processor(s) 507 coupled to a bus 511. Also coupled to the bus 511 is a chipset 513, which may, for example, include memory, graphics and/or input/output control capabilities. One or more memories 517 and one or more input and/or output devices may be coupled to the chipset 513. For some embodiments, a battery connector and/or battery 521 may be coupled to the chipset 513 to provide a power source for the computing platform 500. Further, some embodiments may also include a network communications device such as a Bluetooth device, a wireless or wired local area network device, a modem or other type of device that may provide for connection to a personal, local, wide area or other type of network, for example.
One or more of the components of the system 500 may incorporate one or more correlated logic (CL) micro caches 505 of one or more embodiments. While CL micro cache(s) 505 are shown in
Also, while the example computing system 500 is shown with functionality partitioned in a given way, it will be appreciated that different systems may be partitioned in a different manner.
Using the correlated logic micro cache approach of one or more embodiments, it may be possible to save power associated with computation of logic functions that have already been computed and/or that are computed repeatedly. Further, depending upon the logic complexity, a performance improvement may be realized where an output may be accessed from a correlated logic micro-cache rather than being re-computed.
Thus, various embodiments of an approach for correlated logic micro-caching are described. In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be appreciated that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. For example, while specific example circuitry has been described herein, it will be appreciated that different circuitry that accomplishes similar results may be used for other embodiments. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.