FIELD OF THE INVENTION
The present invention relates to a controller for a Static Random Access Memory (SRAM). In particular, the invention relates to adjustment of timing in an SRAM to provide reduced power consumption by use of a minimum wordline pulse width on a data processing application specific basis.
BACKGROUND OF THE INVENTION
A chip layout for a Static Random Access Memory (SRAM) is typically arranged having a memory cell region comprising an array of memory cells for storing data, Input/Output (I/O) logic providing data input and output interfaces, and control logic performing address decoding to enable memory cells of a selected column of memory cells in the memory cell array. When retrieving data from an SRAM, the control causes a wordline to be activated, which activates a vertical column of memory cells of the memory cell array, and the data from a selected column of activated memory cells is transferred as a plurality of horizontal bitlines to the I/O logic and to output pins of the SRAM. The example vertical and horizontal wordlines and bit line orientations are arbitrary orientations for explanation purposes. A memory cell associated with a shortest length wordline and a shortest length bitline has an access time which is faster than a memory cell associated with a longest length wordline and a longest length bitline.
In prior art SRAM applications, the SRAM data is typically clocked through a series of pipeline stages. In these applications, the speed of the system is established by the slowest element response time, which establishes the clocking rate for the memory system. Further, these prior art data SRAM applications typically rely on error-free data accuracy, since the data being handled may be CPU instructions or data whose accuracy must be fully preserved when reading and writing, and having very low introduced error rates is not only essential, but the subject of many error detection and correction methods to ensure very high data accuracy.
A new type of processing system used in artificial intelligence networks and information processing architectures is known as a Neural Network (NN), and does not have these error-free data processing constraints. In certain NN applications, the data is handled asynchronously in independent stages, and does not rely on pipeline stage processing with fixed pipeline stage processing times governed by a system clock. Additionally, in certain processing applications such as image processing, because of the vast amount of data being processed by the NN, and the inherent random noise already present in the data, these NN applications may not require a high degree of accuracy in reading data, and are insensitive to random additional errors introduced by memory data retrieval errors with a noise variance smaller than the noise variance of the data used for training or inference. In other NN applications, the data precision requirement is low, and only the high order (most significant) bits of data are important or in use, and the low order (least significant) bits of data may be corrupted or lost without loss of accuracy of the inferences formed by the NN processor. However, prior art SRAMs do not provide flexibility in accuracy, but may have higher power consumption, neither of which would be helpful in certain neural network data processing applications.
A new memory architecture is desired which provides the ability to trade off memory access time with accuracy of retrieved data and power consumption, and to provide an arrangement of data in the memory array which provides incrementally greater accuracy for most significant bit data than for least significant bit data.
OBJECTS OF THE INVENTION
A first object of the invention is a memory array which provides a shorter bitline path for most significant bits (MSB) of a memory word than for least significant bits (LSB).
A second object of the invention is a memory array comprising:
- a top memory cell array accessed by activating a wordline which causes the top memory cell array to output data onto one or more bitlines;
- a bottom memory cell array accessed by activating a wordline which causes bottom memory cell array to output data onto one or more bitlines;
- a wordline controller configured to examine output data from the one or more bitlines, the wordline controller modifying a wordline pulse width until at least two of the following distinct error states occur:
- a high error rate where an MSB of a memory cell has an error rate in the range of 2% to 15%, or approximately 10%;
- a moderate error rate where an MSB of a memory cell has an error rate in the range of .5% to 2%, or approximately 1%;
- a low error rate where an MSB of a memory cell has an error rate in the range of 0.005% to 0.5%, or approximately 0.1%
- an error-free error rate where an MSB of a memory cell has an error rate less than 0.00034%.
SUMMARY OF THE INVENTION
A static random access memory (SRAM) comprises at least one memory cell array whereby a memory cell array is activated by at least one wordline which is driven by a controller. The memory cell array has output bitlines where each activated memory cell asserts output data to an input/output (IO) controller, which provides the output data to an output port of the SRAM. The controller is configured to modify a pulse width of the at least one wordline until a particular output error rate is reached, where the output error rate may be selected to fall into at least two and preferably four or more ranges, such as a high error rate of approximately 10% error rate for an MSB of a memory cell, a medium error rate of approximately 1% for an MSB of a memory cell, approximately 0.1% for an MSB of a memory cell, and an error-free rate which may be defined as less than six sigma (corresponding to an error rate less than 0.00034). In one example of the invention, the memory cell array is configured such that a wordline has a shortest length from a controller source to a memory cell column for low memory address such as 0x0000 (0x prefix indicating hexadecimal notation), and a longest length for a high memory address such as 0xFFFF for a 64 K 32 bit word memory of the present examples. In another example of the invention, the activation of a memory cell by the wordline results in the memory cell providing a plurality of bitlines carrying the memory cell output data, where the memory bits are assigned such that a bitline for an MSB is shorter than a bitline for a corresponding LSB of the same memory address.
A wordline driver has a variable width control line, such that the width of an activation signal carried by the wordline can be shortened to reduce power consumption of the memory in exchange for increased error rate, and the wordline driver can be configured to provide a high error rate, medium error rate, low error rate, or error free operation in exchange for memory speed and power consumption. A calibration routine is provided which enables the association of a wordline pulse width for each of the associated error rates.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1A is a plan view of a memory cell array with IO, control, and drivers, showing a longest wordline path.
FIG. 1B is a plan view of a memory cell array with IO, control, and drivers, showing a shortest wordline path.
FIG. 2 is a plot for a wordline waveform for a low address and short length with a wordline waveform for a high address and long length.
FIG. 3 is a plot of wordline delay vs address which includes a bitline delay associated with bit position for each memory address.
FIG. 4 provides examples for four error rates which may be associated with wordline active width and example applications for the associated error rates.
FIG. 5 is a schematic diagram example for a wordline width controller having four selectable wordline widths.
FIG. 6 are waveform plots for the four selectable wordline widths of FIG. 5.
FIG. 7 is a flowchart for calibration of wordline delay with associated wordline pulse widths to associated error rates.
DETAILED DESCRIPTION OF THE INVENTION
In the present application, like reference numbers refer to like structures. References to “approximately” a nominal value are understood to be in the range of ⅕th of the nominal value to 5x the nominal value. References to “on the order of” a nominal value are understood to be in the range of ⅒th of the nominal value to 10x the nominal value. Other values such as 200ps wordline delay over address ranges and 20ps wordline delay over data bits are for example use only, and depend on the address and data size of the memory, as well as physical layout.
FIG. 1A shows an example memory cell array 102 in an example a chip layout, where top memory cell array 130A and bottom memory cell array 138A comprise arrays of SRAM memory cells. Each memory cell is arranged by sequential address in sequential columns, and each address corresponds to a column which is activated by a wordline driven by control 136A. A column of memory cells activated by a particular wordline drives a plurality of bitlines, which deliver associated output data from the column of memory cells to top I/O drivers 132A and bottom I/O drivers 140A, which deliver data shown as 8 bit bytes to multiplexer 114 which may provide the output data in selectable bytes as data output 116. The wordlines are driven by a controller 136A.
FIG. 1A shows an example longest wordline path for a high address such as 0xFFFF, and FIG. 1B shows an example shortest wordline path for a low address such as 0x0000. In the longest wordline path example of FIG. 1A, controller output 104A drives wordline 108A to the high address memory column furthest from the controller output 104A, with wordline 106A transferring the memory column enable signal to a corresponding column of memory cells, the outputs of which are taken as an example 32 bits (four bytes) as two bytes from each of the top and bottom memory arrays. The top memory array cells drive bitlines 110A and the bottom memory array cells drive bitlines 110B. The data bits for a column are arranged with the MSB closest to the innermost control lines and the LSB closest to the outermost control lines having comparatively longer wordline paths. Accordingly, the MSB has an incrementally shorter bitline path than an LSB for any given memory address. FIG. 1B shows a shortest path wordline example for a low value address, where controller output 104B drives comparatively short wordline 108B to column wordline 106B to a respective column of low value address memory cells. The outputs of the memory cells are transferred via bitlines 110B and 112B to top I/O drivers 132B bottom I/O drivers 140B, which drive the example 32 bit output mux 114 for byte selection 120, as was also shown in FIG. 1A.
FIG. 2 shows a plot of wordline waveform signal integrity as delivered to a low value address memory cell, where wordline 210 with shortest wordline path length to the memory cell array has comparatively fast risetimes 220 and a pulse width 203 from turn on 202 to turn off 204. Waveform 212 shows an example wordline plot received by a distant memory cell associated with a high address value, and longest wordline path length, where the slew rate 222 is reduced from the RC time constant and drive current to the wordline, resulting in reduced activation duration 207 from turn on 206 to turn off 208. The reduced activation time 207 for a remote (high address value) memory cell column compared to a nearby (low address value) memory cell with comparatively longer activation time 203 means that the selection of the wordline pulse width (activation duration) for the memory would be governed by worst case pulse width 207 for error-free operation, and pulse width 203 would be correspondingly greater, while causing unnecessary incremental power consumption but little other benefit.
FIG. 3 shows a plot 304 of an example wordline delay time 314 vs accessed memory address 316 from a low address to a high address. Plot 304 is trapezoidal to show also the variation in access time associated with a MSB 306 compared to LSB 308 for each given memory address, corresponding to the MLB/LSB bitline paths of FIGS. 1A and 1B. The line 304 passing through LSB point 308 may represent the wordline delays for the LSB of the address range, where the lower wordline delays of the line passing through points 302, 306, and 310 may represent the wordline delays for the MSB of the address range, with the other intermediate data bits being linearly arranged vertically according to bit significance. Note from the layout of FIGS. 1A and 1B that for a 32 bit word, the “MSB” and LSB” bits are arranged as two 16 bit values with b0 and b16 as “LSB” and b15 and b31 as “MSB” by position. In an alternative arrangement, the data bits are arranged by wordline length, such that the wordline column of [b31:b0] is physically arranged as [b31 b15 b30 b14 b29 b13 b28 b12 b27 b11 b26 b10 b25 b9 b24 b8 b23 b7 b22 b6 b21 b5 b20 b4 b19 b3 b18 b2 b17 b1 b16 b0], with b31 from upper array and b15 from lower array with shortest wordline path distances, and b16 from upper array and b0 from lower array having longest wordline path distances from source to IO, as shown in FIGS. 1A and 1B.
FIG. 4 shows four example cases for error rates, case 1 being a high error rate such as 10% MSB error rate for training neural network data with a large dataset, case 2 being a moderate MSB bit error rate, such as forming neural network inferences from an audio data stream or noisy image data, case 3 being a low MSB bit error rate, or alternatively a bit error rate where the MSB has no bit errors and where only LSB errors are acceptable (such as training on the standard National Institute for Standards ImageNet dataset available at NIST.gov), and case 4 where virtually no errors are tolerable on any bits, such as a six sigma error rate of less than 0.00034%.
FIG. 5 shows an example wordline width controller with associated waveforms shown in FIG. 6. A system clock 506 and memory request 504 may generate a Mem_Clk 508 signal from the assertion of a memory request 504, and MEM_CLK 508 presets D flip flop 526 to assert wordline _out 530. The Mem_Clk 508 signal also is fed to AND gates 514, 516, 518, 520, each of which is individually enabled by delay select lines DLY1, DLY2, DLY3, DLY4, only one of which is asserted at a time, resulting in a variable length reset signal 524 which clears DFF 526 output to 0, de-asserting Wordline _Out 530. The assertion of DLY1 has the shortest delay and results in the shortest Wordline_Out activation time, and DLY2 assertion enables 516 and results in an incrementally greater delay associated with the two inverters at the input of AND gate 516. DLY3 enables 518 and provides an incrementally greater delay by the combined Delay 1 510 (with inverted output) and gate delay to input of AND gate 518, and DLY4 provides the longest delay associated with Delay2 512 plus three inverters driving AND gate 520. Accordingly, DLY4 may be associated with error-free wordline pulse width, DLY3 may be associated with low error rate pulse width, DLY2 with medium error rate pulse width, and DLY1 with a high error rate pulse width, according to the reduced wordline pulse width (activation time) associated with each, as shown in Wordline _Out waveforms and associated DLY values 618, 620, 622, and 624, respectively, resulting in wordline pulse width (activation duration) 602, 604, 608, and 610, respectively.
FIG. 7 shows an example wordline error rate and pulse width association flowchart. The wordline pulse width may be set to a maximum pulse width such as DLY4 initially 704, and loop steps 705, 706, 708, and 714 result in the reading N times of a word in step 705 compared to a reference value stored in a controller register, followed by estimating an error rate 706, determining if the error rate for a given DLY value is below than a threshold value 708, and decreasing the wordline pulse width 714 if so (such as by changing from DLY4 to DLY3 or DLY3 to DLY2, etc.), until the DLY[3:0] value associated with lowest pulse width which satisfies a particular error rate is achieved, at which time the pulse width may be optionally incremented by one DLY step for a safety margin to ensure an error rate which is no greater than the desired one, and the DLY value associated with a particular error rate is saved in step 712 for future reference. In this manner, each of the error rates such as the examples of FIG. 4 has an associated DLY value which may be used for the associated data types of FIG. 4.
In another example of the invention, the error rates for a particular DLY value may increase with address value, since a fixed DLY value is associated with a particular data type. In this example, data associated with a lower error rate may be stored in low addresses and data associated with high error rates may be stored in high addresses. Similarly, it is preferable to arrange memory cells to have a shortest path for the MSB and incrementally longer paths for LSB in certain neural network applications, where LSB errors cause fewer inference errors than MSB errors.
The present examples are provided for illustrative purposes only, and are not intended to limit the invention to only the embodiments shown.