1. Field
The present disclosure pertains to the field of data processing and, more specifically, to the field of content addressable memories (“CAMs”) in microprocessors and other data processing apparatuses.
2. Description of Related Art
CAMs are used in applications where entries are identified, or “looked-up,” based on their contents instead of their addresses. These applications include caches and translation look-aside buffers.
In a typical configuration, CAM look-ups are implemented in dynamic logic. A match to a CAM entry is indicated by a logical high state on a hit line that is pre-charged high in one phase of the clock, and conditionally discharged by one or more CAM cells in the other phase. Each CAM cell corresponds to one bit of one CAM entry, and includes a pull-down transistor controlled by a comparator. The comparator turns the pull-down transistor on when the CAM entry bit does not match the corresponding look-up bit.
In this typical configuration, every cell of every entry must be checked on a look-up. However, in most applications where CAMs are used, there are only a few matches per look-up, usually no more than one. Therefore, almost every CAM look-up requires charging an aggregate load proportional to the number of entries times the number of bits per entry, and discharging virtually the entire load. Consequently, CAMs may account for a significant portion of the power consumed by high performance microprocessors.
The present invention is illustrated by way of example and not limitation in the accompanying figures.
The following description describes embodiments of techniques for sharing the comparison logic in a CAM. In the following description, numerous specific details, such as logic and circuit configurations, are set forth in order to provide a more thorough understanding of the present invention. It will be appreciated, however, by one skilled in the art, that the invention may be practiced without such specific details. Additionally, some well known structures, circuits, and the like have not been shown in detail, to avoid unnecessarily obscuring the present invention.
Embodiments of the present invention provide techniques for sharing the comparison logic in a CAM, and may be applied to any CAM used in any application, including caches and translation look-aside buffers. Accordingly, the data stored in a CAM using these techniques may be any type of information, including memory addresses, represented by binary digits or in any other form. A CAM using these techniques may have any number of entries and any number of bits per entry, and may be functionally organized according to any known approach. For example, the CAM may be organized into two sections, one for match content and one for payload content, where the match content is the data to be compared to the data presented to the CAM for look-up (the “look-up data”), and the payload content is the data to be delivered if there is a hit to the corresponding match content. Alternatively, the CAM may have no payload section, and instead be organized to deliver the match content itself, or simply an indicator of whether or not there is a match.
Every entry location includes six CAM cells 101 to store the six least significant bits of a tag address. However, the four entry locations in each set share four CAM cells 101 to store the four most significant bits of a tag address. For example, entry locations 111, 112, 113, and 114 include lower portions 111a, 112a, 113a, and 114a, respectively, but share upper portion 110a.
When an entry is loaded into the cache, the four most significant bits of the tag address are compared to the contents of the upper portion for the set into which the entry is loaded. For example, for an entry to be placed into entry location 111, the four most significant bits of the tag address are compared to the contents of upper portion 110a. If there is a match, then the six least significant bits of the tag address are loaded into lower portion 11a, and upper portion 110a and entries 112, 113, and 114 are not changed. However, if there is not a match, then the six least significant bits of the tag address are loaded into lower portion 111a, the four most significant bits are loaded into upper portion 110a, and entries 112, 113, and 114 are invalidated.
On a look-up, the four most significant bits of the look-up data are compared to the contents of all of the four upper portions 110a, 120a, 130a, and 140a, and the six least significant bits are compared to the contents of all of the sixteen lower portions 111a, 112a, 113a, 114a, 121a, 122a, 123a, 124a, 131a, 132a, 133a, 134a, 141a, 142a, 143a, and 144a. AND gates 150 indicate whether there is a hit to any of the lower portions and its corresponding upper portion. Therefore, on a look-up, the dynamic power consumed by the comparison logic for the most significant bits is only one-fourth of what it would be if every entry had its own comparison logic.
One entry location per group is designated as a “primary” entry location. For example, in group 210, entry location 212 is the primary entry location. The other entry locations are designated as “prevalidated” entry locations.
Each prevalidated entry location also includes two prevalidation bits. For example, prevalidated entry location includes upper prevalidation bit 313a to indicate whether the upper portion 213a of the prevalidated entry matches the upper portion 212a of the corresponding primary entry, and middle prevalidation bit 313b to indicate whether the middle portion 213b of the prevalidated entry matches the middle portion 212b of the primary entry. One or both prevalidation bits may be set when an entry is loaded into the prevalidated entry location based on comparisons of the upper and middle portions of the address being loaded with the upper and middle portions of the corresponding primary entry location. When a primary entry is evicted, all of the corresponding prevalidation bits are cleared.
The prevalidation bits may used to enable or disable the comparison logic for the upper and middle portions of the prevalidated entry, and to select whether to use the hit signals from the prevalidated entry or the primary entry. For example, if upper prevalidation bit 313a is set, the comparison logic for upper portion 213a is disabled and multiplexer 323a selects the hit signal from upper portion 212a instead of upper portion 213a. The outputs of multiplexers 323a and 323b, as well as the hit signal from lower portion 213c, are input to AND gate 323 to generate the hit signal for prevalidated entry location 213.
To disable the comparison logic, the prevalidation bits may be inverted by inverters 302 and clock gated with AND gates 303 and used as enable inputs to corresponding CAM cells. The clock gating may be used to disable the CAM cells' look-up logic while their hit lines are being precharged by PMOS pull-up transistors 304.
Therefore, the sharing of address bits may reduce the dynamic power consumption of CAM 200 by disabling the comparison logic for portions of the addresses of prevalidated entries that match corresponding portions of primary entries.
System 700 also includes memory 720 coupled to processor 710 through bus 715, or through any other buses or components. Memory 720 may be any type of memory capable of storing data to be operated on by processor 710, such as static or dynamic random access memory, semiconductor-based read only memory, or a magnetic optical disk memory. Look-up data to be compared to data stored in CAM 200 may be stored in memory 720 or may represent an address of data in memory 720. System 700 may include any other buses or components in addition to processor 710, bus 715, and memory 720.
Processor 200, processor 100, or any other processor or component designed according to an embodiment of the present invention, may be designed in various stages, from creation to simulation to fabrication. Data representing a design may represent the design in a number of manners. First, as is useful in simulations, the hardware may be represented using a hardware description language or another functional description language. Additionally or alternatively, a circuit level model with logic and/or transistor gates may be produced at some stages of the design process. Furthermore, most designs, at some stage, reach a level where they may be modeled with data representing the physical placement of various devices. In the case where conventional semiconductor fabrication techniques are used, the data representing the device placement model may be the data specifying the presence or absence of various features on different mask layers for masks used to produce an integrated circuit.
In any representation of the design, the data may be stored in any form of a machine-readable medium. An optical or electrical wave modulated or otherwise generated to transmit such information, a memory, or a magnetic or optical storage medium, such as a disc, may be the machine-readable medium. Any of these mediums may “carry” or “indicate” the design, or other information used in an embodiment of the present invention, such as the instructions in an error recovery routine. When an electrical carrier wave indicating or carrying the information is transmitted, to the extent that copying, buffering, or re-transmission of the electrical signal is performed, a new copy is made. Thus, the actions of a communication provider or a network provider may be making copies of an article, e.g., a carrier wave, embodying techniques of the present invention.
Thus, techniques for sharing comparison logic in a CAM have been disclosed. While certain embodiments have been described, and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that this invention not be limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art upon studying this disclosure. For example, the number of entries sharing the match logic in an embodiment like that of