1. Technical Field
This disclosure relates to shifter logic circuits, and more particularly to funnel shifters.
2. Description of the Related Art
Most processor designs use arithmetic logic units (ALUs) that implement some type of logical and/or arithmetic shifting circuit to perform various types of bitwise translation/manipulation of values. For example, simple power of two multiplication and division may be performed by shifting a binary value left or right, respectively. There are many types of general-purpose shifters. For example, a barrel shifter may rotate the value using a mask value to determine the type of shift. However, in some cases, some general-purpose shifters such as barrel shifters may not be fast enough for a given application.
Various embodiments of a funnel shifter are disclosed. In one embodiment, the funnel shifter includes an input, an output, and a multiplexer unit. The multiplexer unit may include a number of multiplexer levels. The multiplexer unit may be configured to perform one of a plurality of shift operations on the input value and to provide the output value in response to receiving a shift value and a shift operation value. A first multiplexer level may be configured to format and expand the input value into a larger intermediate value. At least a second multiplexer level may be configured to perform a linear shift of the intermediate value without wrapping any bits for creating the output value. At least some of the multiplexer levels may include multiplexer select signals that correspond to the shift values and the shift operation value. Each of the select signals may be represented as a plurality of N-Nary one of N signals where N is greater than or equal to two, wherein each of the plurality of N-Nary signals being implemented on a set of physical wires.
In another embodiment, a funnel shifter includes an input configured to receive an input value, an output configured to provide an output value, and a multiplexer unit coupled between the input and the output and including a plurality of multiplexer levels. The multiplexer unit may be configured to perform one of a number of shift operations on the input value and to provide the output value dependent upon received shift values and received shift operation values. In addition, the shift operations include bitwise logical shifts right and left, bitwise arithmetic shifts right and left, bitwise rotate left and right, and bit and byte order manipulation operations. The number of bits of shift may be determined for the bitwise operations by the received shift value.
Specific embodiments are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description are not intended to limit the claims to the particular embodiments disclosed, even where only a single embodiment is described with respect to a particular feature. On the contrary, the intention is to cover all modifications, equivalents and alternatives that would be apparent to a person skilled in the art having the benefit of this disclosure. Examples of features provided in the disclosure are intended to be illustrative rather than restrictive unless stated otherwise.
As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.
Various units, circuits, or other components may be described as “configured to” perform a task or tasks. In such contexts, “configured to” is a broad recitation of structure generally meaning “having circuitry that” performs the task or tasks during operation. As such, the unit/circuit/component can be configured to perform the task even when the unit/circuit/component is not currently on. In general, the circuitry that forms the structure corresponding to “configured to” may include hardware circuits. Similarly, various units/circuits/components may be described as performing a task or tasks, for convenience in the description. Such descriptions should be interpreted as including the phrase “configured to.” Reciting a unit/circuit/component that is configured to perform one or more tasks is expressly intended not to invoke 35 U.S.C. §112, paragraph six, interpretation for that unit/circuit/component.
The scope of the present disclosure includes any feature or combination of features disclosed herein (either explicitly or implicitly), or any generalization thereof, whether or not it mitigates any or all of the problems addressed herein. Accordingly, new claims may be formulated during prosecution of this application (or an application claiming priority thereto) to any such combination of features. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the appended claims.
Turning now to
In various embodiments, during processing the arithmetic logic unit 16 may perform a variety of operations that require the use of funnel shifter 18 to perform various logical and/or arithmetic shift operations. Specifically, in one embodiment, the funnel shifter 18 may implement shift types including a rotate right with extend (RRX), rotate right (ROR), logical shift right (LSR), arithmetic shift right (ASR), logical shift left (LSL), reverse signed halfword (REVSH), reverse bits (RBIT), reverse byte (REV), and reverse halfword (REV16). In one embodiment, the RRX operation shifts a 32-bit value to the right one bit and shifts in the carry flag to bit 32. The REVSH operation reverses the byte order in the lower halfword of a 32-bit value and sign extends the result to 32 bits. The RBIT operation reverses the bit order of a 32-bit value. The REV operation reverses the byte order in a 32-bit value. The REV16 operation reverses the byte order in each 16-bit halfword of a 32-bit value. The LSR, LSL, ASR, and ROR may be considered to be bitwise shifts in which the number of bits is determined by a shift amount input. However, REVSH, RBIT, REV, and REV16 operations are considered to be bit and byte order manipulation operations.
In one embodiment, the funnel shifter 18 may be implemented using standard static logic, and in other embodiments the funnel shifter 18 may be implemented using N-nary logic, which is described in greater detail below in conjunction with the description of
Referring to
As shown, the input value to the level one mux 203 is a 32-bit value. However the level one mux 203 takes the 32-bit input value and expands it to a 63-bit output value. Thus, the level one multiplexer 203 essentially formats the input value into the appropriate output value dependent on the type of shift being performed. The level two mux 205 has a 63-bit input and a 35-bit output, and the level three mux 207 has a 35-bit input and a 32-bit output. Thus as described in greater detail below, level two and level 3 multiplexers primarily do the shifting of the bits while the level one multiplexer 203 formats the input value so that the shifting done in the level two and level three multiplexers is independent of the type of shift, and is instead dependent on the number of bits to shift. As described further below in conjunction with the descriptions of
As mentioned above, in one embodiment, the funnel shifter 18 may be implemented using N-nary logic. Generally speaking, N-nary logic, which is commonly referred to as N-nary dynamic logic or NDL, refers to a logic family which supports a variety of signal encodings that are of the 1 of N form where N may be any integer greater than one. A more common implementation of NDL uses 1 of 4 encodings, which uses four wires or signals to indicate one of four possible values.
In the N-nary design style, a 1 of 4 (or a 1 of N) signal corresponds to a bundle of wires kept together throughout the inter-cell route, which requires the assertion of no more than one wire either while precharging or evaluating. A traditional binary logic design in comparison would use only two wires to indicate four values by asserting neither, one, or both wires together. The number of additional wires represents one difference of the N-nary logic style, and on the surface makes it appear unacceptable for use in microprocessor designs. One of N signals are less information efficient than traditional signals because they require at least twice the number of wires, but N-nary signals have the advantage of including signal validation information, which is not possible with traditional signals. It is this additional information (the fact that when zero wires are asserted the result is not yet known) that indirectly allows us to eliminate P-channel logic and all of the series synchronization elements required in traditional designs.
Another advantage of the N-nary logic family is that N-nary signals include both true and false information, which means inverters are never required. This is important in two respects. First, a static design can no more avoid logical inversion than can N-nary logic. Although not obvious with any signal encoding other than 1 of 2 encoding, N-nary logic produces the logical inversion at each gate all the time. Static design often requires the inversion of signals, and so places inverters near the signal's destination.
Another advantage of the N-nary logic family is that it allows the designer to perform logic evaluations using a single type of transistor, for example, N-channel only logic or P-channel only logic. There may be several benefits to N-channel only evaluation gates relative to traditional static gates. The first is the elimination of P-channel devices on input signals, the second is the elimination of the need to build the complementary function in P-channel devices, and the third is the ability to share the N-channel evaluation “stack” among multiple outputs. Sharing portions of the evaluate “stack” among multiple outputs is not possible with static CMOS gates because it is not possible to obtain each output's function and complement from shared devices in both the P and N-channel stacks. Other dynamic logic families such as MODL, or Multiple Output Dynamic Logic, can produce multiple outputs by leveraging the fact that sub-functions are naturally available within dynamic evaluation stacks. The N-nary design style does not use sub-functions within evaluation stacks to produce multiple outputs. Instead, the N-nary design style uses separate evaluation stacks to directly produce the multiple outputs. The N-nary design style is similar to MODL in its ability to reduce transistor counts, but is superior in its ability to produce fast, power efficient circuits. When compared to static CMOS gates, the savings may be dramatic.
Referring to
In the embodiment shown in
The precharge circuit 301 is coupled to the logic tree circuit 303 and precharges the dynamic logic of the logic tree circuit 303. The precharge circuit 301 may include one or more FETs, which in one embodiment may be P-channel FETs. Each evaluation path of the logic tree circuit 303 may have its own precharge P-FET. Coupled to the precharge circuit 301 is the clock signal CKA. A low clock signal on CKA will cause the FETs in the logic tree circuit 303 to charge when using P-channel FETs in the precharge tree circuit 301.
The evaluate circuit 305 is coupled to and controls the evaluation of the logic tree circuit 303. The evaluate circuit 305 may include one or more FETs, which in one embodiment may be a single N-channel FET. The CKA signal is also coupled to the evaluate circuit 305. A high clock signal on CKA will cause the FETs in the logic tree circuit 303 to evaluate when using N-channel FETs in the evaluate circuit.
An exemplary 3:1 mux implemented using a 1 of 4 or “quadenary” encoding is shown in
Referring to
Since the mux 350 is a quadenary logic mux, there are 4 signals for each input and output. The logic tree portion 353 includes an N-channel transistor for each of the data inputs A0-A3, B0-B3, and C0-C3. Likewise there is one N-channel transistor for each of selects S0-S3. Thus, the two-stack of N-channel transistors in the logic tree is quick to evaluate. In the illustrated embodiment, the evaluate portion 355 includes a single N-channel transistor that is coupled to circuit ground and to the CKA clock signal. However, the precharge portion 351 includes one P-Channel transistor for each of the output lines, and each is coupled to the CKA clock signal. The output stage 357 includes an inverter and a P-channel transistor for each output line. This configuration is referred to as a hold circuit, which holds the pre-charge value on the output until the logic tree evaluates to a logic zero.
An exemplary 3:1 mux implemented using a 1 of 1 encoding is shown in
Referring to
Since the mux 370 is a 1 of 1 logic mux, there is one data signal for each input and output. The logic tree portion 373 includes an N-channel transistor for each of the data inputs A0, B0, and C0. Likewise there is one N-channel transistor for each of selects S0-S2. Thus, the two-stack of N-channel transistors in the logic tree is quick to evaluate. In the illustrated embodiment, the evaluate portion 375 includes a single N-channel transistor that is coupled to circuit ground and to the CKA clock signal, and the precharge portion 371 includes one P-Channel transistor for the output line, and it is coupled to the CKA clock signal. The output stage 377 includes an inverter and a P-channel transistor for the output line Z0. This configuration is referred to as a hold circuit, which holds the pre-charge value on the output until the logic tree evaluates to a logic zero.
As mentioned above, the different shifting types in the funnel shifter of
Referring now to
Turning to
Referring back to
Thus, referring back to
Referring to
Similarly for the ASR 3 shift example, 38 bits of the level one output from
Turning to
The peripherals 707 may include any desired circuitry, depending on the type of system. For example, in one embodiment, the system 700 may be included in a mobile device (e.g., personal digital assistant (PDA), smart phone, etc.) and the peripherals 707 may include devices for various types of wireless communication, such as WiFi, Bluetooth, cellular, global positioning system, etc. The peripherals 707 may also include additional storage, including RAM storage, solid-state storage, or disk storage. The peripherals 707 may include user interface devices such as a display screen, including touch display screens or multitouch display screens, keyboard or other input devices, microphones, speakers, etc. In other embodiments, the system 700 may be included in any type of computing system (e.g. desktop personal computer, laptop, workstation, net top etc.).
The external system memory 705 may include any type of memory. For example, the system memory 705 may be in the DRAM family such as synchronous DRAM (SDRAM), double data rate (DDR, DDR2, DDR3, etc.), or any low power version thereof. However, system memory 705 may also be implemented in SDRAM, static RAM (SRAM), or other types of RAM, etc.
Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
This patent application claims priority to Provisional Patent Application Ser. No. 61/454,274, filed Mar. 18, 2011, the content of which is herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61454274 | Mar 2011 | US |