This disclosure relates to bitline encoding, including bitline encoding for memory circuits such as static random access memory (SRAM) circuits.
Rapid advances in electronics and communication technologies, driven by immense customer demand, have resulted in the worldwide adoption of an immense range of electronic devices. Many of these devices receive, store, and process data at significant clock rates, heavily relying on memory storage to do so. With increased clock rates comes increased energy consumption. Reduced energy consumption is often a design goal that is pursued to achieve, as just one example, longer operation on a limited battery charge.
The discussion below describes a static random access memory (SRAM) read circuit. The read circuit reduces energy consumption by employing a local sense amplifier with multiple-bit (e.g., two-bit) encoding in the output stage. In addition, the read circuit includes a ‘m’-input to ‘n’-output (e.g., four-input to two-output) decoding global sense amplifier. The sense amplifier is responsive to encoded low-swing global bitlines driven by the output stage.
While the discussion below focuses primarily on the use of encoded bitlines in a memory architecture, encoded bitlines may be used in other circuits. That is, the encoded bitline techniques described below may be added to any type of circuitry that carries data on individual bitlines. As one example, a data bus between a processor and an interface port (e.g., a PCIe port) may encode, transfer, and decode encoded data over the bitlines, data lines, or data buses between the processor and the interface port.
Within the banks are individual SRAM memory cells with local sense circuitry, e.g., the local sense circuitry 104 and 154. Locally, the memory cells may adhere to an architecture with bitline negative (BL*) lines (e.g., 106, 156) and bitline positive (BL) lines (e.g., 108, 158) to drive and read data into cross coupled inverters that hold the data in each memory cell.
In the architecture 100 the memory cells output their data on a single-ended global output bitline, e.g., the single-ended global output bitline 110. In contrast, in the architecture 150, the memory cells output their data on differential global output bitlines, e.g., the differential global output bitlines 160. Accordingly, the architecture 100 includes global sense circuitry 112 that receive the single-ended data, and drive the single-ended output line 114, e.g., to other connected circuitry. The architecture 150 includes global sense circuitry 162 that receives the differentially communicated data on the differential global output bitlines 160, and that drives the single-ended output line 116 accordingly.
As a specific example, the memory cells may be 6T SRAM memory cells. In 6T cells, a read is performed by activating a word line in one of the banks, and then activating a local sense circuit within the bank. The local sense circuitry drives the global bitlines to global sense circuitry, which in turn drives a (typically) single-ended output from the memory array. In many typical use cases, the global bitlines consume nearly 50% of the total dynamic read power consumption of the memory array.
Table 1, below, shows normalized dynamic power consumption of the architecture 100.
The left most column shows the data state of two bits of data. Note that during a read operation, the global bitlines 110 are typically pre-charged. In this single-ended example, pre-charging involves charging the global bitlines 110 to substantially the supply voltage Vdd, while discharging the global bitlines 110 involves driving the global bitlines to substantially Vss, e.g., ground.
To output two bits of data from the memory array that are 00, the architecture 100 pre-charges and then discharges two global bitlines. That is, the two global bitlines transition from a fully charged state to a fully discharged state, consuming two units of power, as shown in Table 1. Similarly, to output the 01 or the 10 state, both global bitlines are fully charged, and one global bitline is fully discharged; the other global bitline remains pre-charged and does not transition. Table 1 shows these operations consuming one unit of power. To output the 11 state, both global bitlines are pre-charged, and both remain pre-charged, consuming no dynamic power. Accordingly, Table 1 shows zero units of power in the right column. The average power consumption of outputting two bits of data across all four possible data states is 1 unit of power. Expressed another way, the four possible combinations of bits cause four discharge events or state transitions starting from the full pre-charged state: two for output 00, one for output 01, one for output 10, and zero for output 11.
Table 2, below, shows normalized dynamic power consumption of the architecture 150.
As in Table 1, the left most column shows the data state of two bits of data. For the architecture 150 it is assumed that during a read operation, the global bitlines 160 are differential, and are pre-charged in a low-swing manner, e.g., to Vdd/2 or another pre-defined fraction of Vdd, so that a bit transition does not cause a full discharge of the supply voltage or a full charge of the supply voltage. That is, for the differential global bitlines in the architecture 150, pre-charging involves charging the global bitlines 110 to a portion of the supply voltage, e.g., Vdd/2, while discharging the global bitlines 110 involves driving the global bitlines 110 to substantially Vss, e.g., ground. In other implementations, low-swing encoding may include charging the global bitlines to Vdd, and discharging them to Vdd/2 or another fraction of Vdd.
Note that two pairs of differential global bitlines 160 carry the data in this example, one pair per global sense amplifier 162. To output two bits of data from the memory array that are 00, the architecture 150 low-swing pre-charges all four global bitlines, and then discharges two global bitlines. That is, two global bitlines transition from a partially charged state to a fully discharged state, consuming 0.5 units of power each (one unit of power in total), as shown in Table 2. Similarly, to output the 01 or the 10 state, all four global bitlines are low-swing pre-charged, and two global bitlines are fully discharged. Table 2 shows these operations consuming one unit of power. Similarly, to output the 11 state, all four global bitlines are low-swing pre-charged, and two transition to fully discharged states, consuming one unit of power as noted in Table 2. The average power consumption of outputting two bits of data across all four possible data states is again 1 unit of power. As with the example shown in Table 2 above, the four possible combinations of bits cause eight low-swing discharge events starting from the low-swing pre-charged state: two for output 00, two for output 01, two for output 10, and two for output 11.
Note that the local sense circuitry includes bitline encoder circuitry, e.g., the bitline encoder circuitry 210. Further, in the architecture 200, the memory cells output their data on multiple pairs of encoded global output bitlines, e.g., the pairs of encoded global output bitlines 212. These pairs form an encoded output that carries encoded representations of the input bits read from the individual memory cells. In one implementation, the encoded global output bitlines are low-swing bitlines, e.g., pre-charged to Vdd, and discharged to Vdd/2.
The architecture 200 also includes global sense circuitry 214 that receives the encoded representations on the encoded output, and drive the single-ended output lines that are connected circuitry. In this example, the global sense circuitry 214 will convert the encoded representation into two individual single-ended bit outputs, e.g., the bit output 216 and the bit output 218. As noted above, the memory cells may be 6T SRAM memory cells. A read is performed by activating a word line in one of the banks, and then activating the local sense circuit, including encoder circuitry, within the bank. The local sense circuitry drives the global bitlines with an encoded output to the global sense circuitry, which in turn drives single-ended outputs from the memory array.
In this example, the architecture 200 uses two-bit encoding to map a first input bit and a second input bit of data (read from the memory cells) into four one-hot low swing dynamic global bitlines. The encoding is done such that a transition of one of the four global bitlines corresponds to one of four possible states of the two bits of data.
The global sense circuitry 214 implements a four-input to two-output decoder, with the decoding 302 shown in
Tables 3 and 4 assume pre-charged bitlines. The encoding technique applies to pre-discharged bitlines as well, as shown in the encoding in Table 5 below. Note that, for pre-charged bitlines, the encoded representation causes fewer discharge events than the differentially defined bits would cause on differentially encoded global bitlines. In implementations with pre-discharged bitlines, the encoded representation causes fewer charge events than the differentially defined bits would cause on differentially encoded global bitlines.
Table 6, below, shows normalized dynamic power consumption of the architecture 200 in the rightmost column, compared with the two architectures 100 and 150.
In the architecture 200, low-swing pre-charge to Vdd and discharge to Vdd/2 is used on the global bitlines (and other low-swing ranges may be employed in other implementations). Regardless of whether the global bitlines are pre-charged or pre-discharged, in each of the four data states (for two bits read from memory), one global bitline of each of the four encoded global bitlines (e.g., the encoded global output bitlines 212) changes charge state. For pre-charged global bitlines, the charge state transition is from a Vdd level to Vdd/2 and the other global bitlines in the encoded group stay at the pre-charged level. Each set of encoded global output bitlines consumes 0.5 units of power to carry the encoded representation, regardless of the two bit inputs.
Note that for the two data bit example, each encoded group of global bitlines includes four global bitlines to carry an encoding that represents the data state of the two data bits. The power consumed by the state transition after pre-charge to represent the two data bits read from the memory cells is 0.5 units of power, because there is a single state transition (e.g., one-hot) with the encoding shown in Tables 3 and 5. The average power consumption of two bits of data across all four possible data states is 0.5 units of power.
The architecture 200 reduces global bitline dynamic power by 50% over other architectures. In large SRAMs, global bitline power can account for up to 50% of the total dynamic power of the memory. As a result, the architecture 200 reduces total dynamic power by 25% when low-swing (e.g., Vdd/2) switching is used on the global bitlines. In some implementations, as little as 100 mV of signal margin may be used on the global bitlines to provide an even greater power reduction, e.g., total dynamic power reduction of 30% or more.
The logic 400 includes receiving inputs bits (402), e.g., differentially defined bits read from memory cells. The logic 400 encodes the input bits according to a pre-defined mapping to obtain an encoded representation of the bits (404). The encoded representation is carried over a pre-determined number of bitlines in a group, e.g., 4 global bit lines that carry a 4-bit encoded representation of two bits of data. The logic 400 then outputs the encoded representation over the group of bitlines (406). The group of bitlines may be, as examples, low-swing encoded pre-charged global memory cell bitlines, or data bus lines between devices.
A receiving circuit receives the encoded representation (408). For example, the receiving circuit may be global sense circuitry in a memory array, or a bus interface circuit in communication with a data bus. The receiving circuit decodes the encoded representation (410), and outputs the decoded input bits to subsequent circuitry (412).
Expressed another way with regard to memory architectures, the bitline encoding is implemented in circuitry that includes first memory cell connections configured to differentially define a first input bit, and second memory cell connections configured to differentially define a second input bit. The local sensing may be differential or single-ended sensing, however. The circuitry also includes encoding circuitry with an encoded output. The encoding circuitry is configured to receive the first input bit, receive the second input bit, and map the first input bit and the second input bit to a pre-defined encoded representation. The circuitry outputs the pre-defined encoded representation on the encoded output.
In a memory architecture, the first memory cell connections and the second memory cell connections may be local sense amplifier outputs, e.g., SRAM sense amplifier outputs. When the encoded output is a pre-charged output, the pre-defined encoded representation includes fewer discharge states than fully differentially representing the first input bit and second input bit on a set of outputs. When the encoded output is a pre-discharged output, the pre-defined encoded representation includes fewer charge states than fully differentially representing the first input bit and second input bit on a set of outputs. Decoding circuitry receives the encoded output, determines the first input bit and the second input bit from the encoded output, and communicates the first input bit and the second input bit as individual data bits on a decoded output.
Said another way, the bitline encoding techniques described above may be implemented in many different types of circuits, systems, and devices. Examples include instruction processors, such as a Central Processing Unit (CPU), microcontroller, or a microprocessor; Application Specific Integrated Circuits (ASIC), Programmable Logic Device (PLD), or Field Programmable Gate Array (FPGA). The encoding techniques may be used with memory bitlines, data lines, and data buses and other types of signal lines (e.g., for address, control, and data signals) that connect discrete interconnected hardware components on a printed circuit board, or that connect components manufactured on a single integrated circuit die, distributed among multiple integrated circuit dies, or implemented in a Multiple Chip Module (MCM) of multiple integrated circuit dies in a common package, as examples.
Note also that the bitline encoding techniques described above are not limited to two-input bit to four-output bit encoding. Any number of input bits may be mapped to an encoded representation with fewer discharge events, in the case of bitline pre-charging, or fewer charge events, in the case of bitline pre-discharging. Table 7 provides an example of mapping three input bits to an eight-bit encoded representation with one state transition. Table 8 provides an example of mapping four input bits to a 16-bit encoded representation with a single state transition.
Tables 3, 5, 7, and 8 provide examples of single transition encoding. Other encoded representations may include multiple bitline transitions, with the goal to save power in comparison to a fully differential representation. These encoded representations may be implemented for any number of input bits. Table 9 provides one such example of an encoded representation of three input bits to six encoded global bitlines. Encoded representations that are a multiple of two bits wide may be useful to build on top of memory architectures that already fabricate two differential global bitlines per data bit.
1 unit
1 unit
The encoding in Table 9 uses, on average, ⅝th of a unit of power for data transmission, compared to 1.5 units for a fully differential representation on the global bitlines.
Several example implementations of bitline encoding have been specifically described. However, many other implementations are also possible.
This application claims priority to provisional application Ser. No. 62/280,469, filed Jan. 19, 2016, which is entirely incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62280469 | Jan 2016 | US |