The present invention relates to circuitry that can be used to perform sum-of-products operations.
In neuromorphic computing systems, machine learning systems and circuitry used for some types of computation based on linear algebra, the sum-of-products function can be an important component. The function can be expressed as follows:
In this expression, each product term is a product of a variable input Xi and a weight Wi. The weight Wi can vary among the terms, corresponding for example to coefficients of the variable inputs Xi.
The sum-of-products function can be realized as a circuit operation using cross-point array architectures in which the electrical characteristics of cells of the array effectuate the function.
For high-speed implementations, it is desirable to have a very large array so that many operations can be executed in parallel, or very large sum-of-products series can be performed. In some systems, there can be a very large number of inputs and outputs, so that the total current consumption can be large.
Also, artificial intelligence AI functions include large scale matrix multiplication, involving multiply-and-accumulate MAC steps (i.e., sum-of-products) using multi-bit weights, which can require very dense memory, and high bandwidth data communications.
Recent advances in AI hardware have been directed to high performance and low power solutions. To meet these needs, “in-memory computation” or “processor-in-memory” implementations have been proposed. These technologies can reduce data movement requirements to save power and latency.
It is desirable to provide structures for sum-of-products operations suitable for implementation in large arrays, and that can be flexible, high-capacity and energy-efficient.
Multiply-and-accumulate technology, for sum-of-products operations on a large scale, is described based on in-memory computation using a plurality of NAND blocks. NAND blocks used in examples described herein can be implemented using 3D NAND flash technologies.
Means are described for applying input signals to a multi-member set of bit lines coupled to a NAND block in the plurality of NAND blocks, for connecting sets of NAND strings in the NAND block to respective bit lines in the set of bit lines, and for sensing a sum-of-currents on a source line from the set of bit lines through the respective sets of NAND strings. The conductances (or the reciprocal, resistances) of the sets of NAND strings are determined by the data stored in the memory cells on the NAND strings.
A NAND block in the plurality of NAND blocks, for an embodiment described herein, includes a plurality of NAND strings disposed between bit lines in a multi-member set of bit lines and a source line for the NAND block. The NAND strings have string select switches to selectively connect the NAND strings to corresponding bit lines. The NAND strings include a plurality of memory cells arranged in series between the string select switches and a ground select switch by which the NAND string is connected to a source line. Word lines are coupled to the gates of memory cells in corresponding word line levels of the NAND block. Likewise, string select lines are coupled to the gates of string select switches in corresponding rows of NAND strings.
For a particular NAND block, the multi-member set of bit lines can include B members, and the NAND block can include a set of string select lines having at least S members. In this configuration, the NAND block includes an array of B*S NAND strings, including B columns and S rows of NAND strings. A set of string select drivers are operable in the computation mode to connect S NAND strings (one from each row) in a column of the NAND block to each bit line in the multi-member set of bit lines. In this manner, the computation mode current on the source line is a sum of B product terms, and each product term is a function of an input signal on one of the bit lines in the multi-member set of bit lines times the conductance of the S NAND strings connected to the bit line.
A circuit is described that includes a plurality of NAND blocks, each operable in a computation mode and a memory mode. A bus system connects to the inputs and outputs of the plurality of NAND blocks, whereby input data for a sum-of-products operation can be applied to the NAND blocks, and coefficients for the sum-of-products operation can be stored in the NAND blocks.
The input and output circuits used in computation and memory modes can be shared, and in some cases different circuits can be used for the different modes. The computation mode input of a given NAND block can include a set of bit line drivers to apply input data in parallel to the bit lines of the given NAND block. The computation mode output of the given NAND block can include a multi-bit sense amplifier to sense output data on a source line of the given NAND block. The memory mode input of the given NAND block can comprise a page buffer circuit, which can include the input drivers for the computation mode, or can be a separate circuit.
In other aspects of the technology, a structure of a NAND block suitable for use in a multiply-and-accumulate accelerator is described. Also, in other aspects of the technology, an integrated circuit comprising a plurality of multiply-and-accumulate tiles, where each tile comprises a plurality of NAND blocks as described above.
A method for in-memory computation of a sum-of-products is described, comprising:
a) storing coefficient data w(i,j) for a product term X(i)*W(i) in cells a level L(k) of a NAND block in a column C(i) of NAND strings on string select lines SSL(j) coupled to bit line BL(i), for i going from 1 to B, j going from 1 to S and for k equal to a selected word line level;
b) applying inputs X(i) to bit lines BL(1) to BL(B), and string select voltages to string select lines SSL(1) to SSL(S), and a word line compute voltage to cells in the selected word line level (simultaneously or in a combination overlapping in time to bias the cells for sensing);
c) combining currents through the columns C(1) to C(B) of NAND strings connected to bit lines BL(1) to BL(B) on a source line for the NAND block; and
d) sensing a current magnitude on the source line to generate an output signal representing the sum-of-products.
An array of NAND blocks arranged as described herein can be operated in a pipeline fashion, supporting high throughput operations, such as useable for inference mode operations of machine learning systems.
Applying technology described herein, a dense and energy-efficient multiply-and-accumulate accelerator is provided. Embodiments can be configured to execute on the order of tera-operations TOPS per watt.
Other aspects and advantages of the present invention can be seen on review of the drawings, the detailed description and the claims, which follow.
The following description will typically be with reference to specific structural embodiments and methods. It is to be understood that there is no intention to limit the technology to the specifically disclosed embodiments and methods but that the technology may be practiced using other features, elements, methods and embodiments. Preferred embodiments are described to illustrate the present technology, not to limit its scope, which is defined by the claims. Those of ordinary skill in the art will recognize a variety of equivalent variations on the description that follows.
As used herein, the term “coupled” means operatively coupled. Items that are coupled in this sense are not necessarily directly connected, and there may be intervening items between the coupled items.
In the illustrated example, the output of the summation is applied to a sigmoid function to produce an output that ranges in a non-linear fashion between a minimum and a maximum such as between 0 and 1. Other activation functions can be used as well, such as a logit function, or rectifier functions. The sum-of-products operation can be applied as well in configurations not neuromorphic or not otherwise considered to model neurological systems.
Neural networks, including convolutional neural networks, and other types of neural networks, often comprise logic organized in layers, in which each layer can receive as input the output of a previous layer, possibly subjected to an activation function, and execute hundreds or thousands of sum-of-products operations in parallel, the output of which is applied to a subsequent activation function or other logic unit.
Each of the NAND strings in the array of NAND strings includes a string select switch coupled to a corresponding string select line SSL(1) and SSL(S), which is used to connect and disconnect the corresponding NAND string to its bit line. The NAND string 200 and the NAND string 202 in a first row of the array are coupled to the first string select line SSL(1) for the block. The NAND string 201 and the NAND string 203 and an Sth row of the array are coupled to the Sth string select line SSL(S). In the illustrated example, two string select lines are shown for simplicity of the figure. It is contemplated that a given NAND block can be coupled to many string select lines, in a given implementation, along with corresponding numbers of rows of NAND strings in the array of NAND strings.
Each of the NAND strings in the array of NAND strings includes a ground select switch coupled to a ground select line GSL1 for the block, which is used to connect the corresponding NAND string in the array of NAND strings to the source line SL1 for the block. In some embodiments, the ground select switches on corresponding NAND strings can be controlled by separate ground select lines with appropriate decoding circuitry to support the memory mode and computation mode as described herein.
Each of the NAND strings in the array of NAND strings includes a plurality of memory cells arranged in series between the string select switch and the ground select switch, coupled to corresponding word lines. In this example, all of the word lines in a given level of the NAND block are coupled in common to a single word line conductor, or to separate word line conductors controlled in common by a single word line decoder, such that all, or a selected plurality of, the rows of memory cells in a given level of the NAND block can receive the same word line signal. In this example, the NAND strings of the NAND block are vertical NAND strings including 32 levels of memory cells coupled to 32 word lines WL0-WL31.
In a memory mode of operation, data can be written into the individual memory cells using program and erase operations supported by a page buffer (not shown) coupled to the plurality of bit lines of the block. In the memory mode operations, typically, one of the rows of NAND strings in the array of NAND strings is selected using a selected string select line. In this case, one of the NAND strings in each column of the array of NAND strings is coupled to one of the bit lines. A page buffer can be utilized to program a data pattern into the individual NAND strings in a selected row of the array of NAND strings coupled to a plurality of bit lines in parallel at each level of the NAND block. Also, the page buffer in the memory mode can be utilized to read data stored in the memory cells in a selected row of the array of NAND strings at each level of the NAND block.
In a memory mode, coefficients of a sum-of-products operation can be stored into the NAND block. The NAND cells in the NAND block can be implemented using a single-bit-per-cell technology, or a multiple-bit-per-cell technology. In some embodiments a single-bit-per-cell technology can be preferred as the coefficients are stored in a manner distributed across the plurality of NAND strings as discussed below. In other embodiments, multiple-bit-per-cell technology can be used to provide even greater precision in the programming of the coefficients into the NAND block.
Referring to the illustration in
The conductivity of the selected NAND strings is determined by the data stored in the memory cells at a selected level of the NAND block. In the illustration, the NAND strings at the level of word line WL1 are selected. Thus, as illustrated, the first coefficient W1 of the product term W1*X1 corresponds to the combination of coefficient values w(1-1):w(1-S), in the case that all S string select lines are driven to connect their corresponding NAND strings to the first bit line BL(1). The second coefficient W2 of the product term W2*X2 corresponds to the combination of coefficient values w(2-1):w(2-S), in the case that all S string select lines are driven to connect their corresponding NAND strings to the second bit line BL(2).
In the computation mode of operation described herein, a signal OUTPUT Y1 produced as a result of a sum-of-products operation of the NAND block is provided on the source line SL1 for the block. In the example illustrated in
As illustrated, the current path 290 includes the current on the bit line BL(1) through the NAND string 200 to the source line SL1, plus the current on bit line BL(1) through the NAND string 201 to the source line SL1, plus the current on bit line BL2 through the NAND string 202 to the source line SL1, plus the current on bit line BL2 through the NAND string 203 to the source line SL1.
This current accumulated from the paths 290 corresponds to the sum-of-products terms, W1*X1 and W2*X2. The coefficient W1 is a function of the data values w(1-1):w(1-S) in the column of memory cells coupled to word line WL1 and bit line BL(1), and the coefficient W2 is a function of the data values w(2-1):w(2-S) in the column of memory cells coupled to word line WL2 and bit line BL(2). Using S memory cells in each column of memory cells coupled to a given bit line to represent a coefficient of a product term enables use of high precision data (e.g., having multiple significant digits) to represent the coefficient. By controlling the number of rows selected by string select lines simultaneously during a computation mode operation, and thereby the number of memory cells used to represent the coefficient, the precision of the coefficient can be varied as suits the needs of a particular implementation. Thus, for a block including a number S of string select lines, a single coefficient, if an input data value is represented by a voltage on a single bit line, can be represented by data stored in a number of memory cells ranging from 1 to S.
The current produced in a single NAND string in a NAND block as described herein, during a computation mode operation can be relatively small. Therefore, the combination of currents from all of the NAND strings in an array of NAND strings of the NAND block can remain within a range of current values efficiently sensed using a current sensing sense amplifier coupled to the source line SL1 for the block.
A NAND block as described herein can be implemented using a 3D NAND memory technology. Some examples, among a variety of alternatives known in the art, of NAND blocks including vertical NAND strings are described in U.S. Pat. No. 9,698,156, entitled VERTICAL THIN CHANNEL MEMORY, by Lue, issued 4 Jul. 2017; and U.S. Pat. No. 9,524,980, entitled U-SHAPED VERTICAL THIN CHANNEL MEMORY, by Lue, issued 20 Dec. 2016; and such patents are incorporated by reference as if fully set forth herein. Implementations can also be made using 2D NAND technology, in which the NAND block is logically defined across the plurality of 2D NAND arrays.
In the example illustrated, the coefficient W1 stored in block 1 is represented by the data values stored in memory cells in a column of NAND strings at a selected level of the block. Thus, in block 1, for a selected level at the level of word line WL1, the coefficient W1 in block 1, bit line BL(1) and string select lines SSL(1):SSL(N) corresponds to a combination of the data values w(1-1,1):w(1-N,1). Likewise, for a selected level at the level of word line WL1, the coefficient W2 in block 1, bit line BL(2) and string select lines SSL(1):SSL(N) corresponds to a combination of the data values w(2-1,1):w(2-N,1). In block 2, for a selected level at the level of word line WL1, the coefficient W1 in block 2, bit line BL(1) and string select lines SSL(1):SSL(N) corresponds to a combination of the data values w(1-1,2):w(1−N,2). Likewise, for a selected level at the level of word line WL1, the coefficient W2 in block 2, bit line BL(2) and string select lines SSL(1):SSL(N) corresponds to a combination of the data values w(2-1,2):w(2-N,2).
The pattern of NAND blocks illustrated in
A page buffer and input driver 510 is coupled to the plurality of bit lines BL(1:B). The page buffer includes bit line drivers coupled to bit lines in the plurality of bit lines, and sense amplifiers coupled to bit lines in the plurality of bit lines. The page buffer/input driver 510 is used in the memory mode for holding data used for writing coefficients into the block. In the memory mode, the page buffer is used for the purposes of applying bit line voltages during the program and erase operations, for example, in a NAND flash array. In the computation mode, the input driver is used to apply bit line voltages corresponding to input data of the sum-of-products operations. In some embodiments, the page buffer can be utilized in the computation mode as a means for defining a data pattern of the input data values. In other embodiments, the input data values can be applied to the input drivers using an alternative memory path and decoder. Likewise, the input driver used in the computation mode can be included in the bit line driver in the page buffer used in the memory mode in some embodiments. In other embodiments, the bit line driver for the memory mode is different than the input driver for the computation mode.
A word line and string select line/ground select line decoder 512 is coupled to the plurality of word lines and to the plurality of string select lines and ground select lines of the NAND block. The decoder 512 is operable to select one, or more than one, of the plurality of string select lines for connection of NAND strings to corresponding bit lines. Also, the decoder 512 is operable to select a level of memory cells by driving a selected word line WL(SEL) at a particular level of the block. In the memory mode, the decoder 512 typically selects one string select line and one word line for a page read or page write operation. In the computation mode, the decoder 512 selects a plurality of string select lines including all of the string select lines, and one word line for a sum-of-products operation to produce a current on the source line 515 corresponding to the sum of the conductances of the selected NAND strings in the array.
The source line 515 is coupled to a sense amplifier 514, which converts the current on the source line 515 to a digital output value Y1. The sense amplifier 514 can be disabled during the memory mode. The page buffer/input driver 510 and the sense amplifier 514 can be coupled to a data bus system 520 for routing the input and output data among the NAND blocks for a large scale sum-of-products operation. A sequencing controller, not shown, can be used to coordinate operation of the decoders 512, the page buffer/input driver 510, and the sense amplifier 514, for the memory mode operation and for the computation mode operations.
These elements 510, 512, and 514 and the various embodiments described herein, comprise a means for applying input signals to a multi-member set of bit lines in the plurality of bit lines coupled to a NAND block in the plurality of NAND blocks, for connecting sets of NAND strings in the NAND block to respective bit lines in the set of bit lines, and for sensing a sum-of-currents from the set of bit lines through the respective sets of NAND strings
In some embodiments, the word line decoders and word lines, and the string select line decoders and string select lines, can be arranged in horizontal lanes such that some or all of the NAND blocks in a horizontal lane shares the same word line conductors and word line decoders, and the same string select lines and string select line decoders. In this 8×8 array, there can be 8 horizontal lanes of NAND blocks, for example.
The bit lines of the tile are coupled to a set of circuits like those used in large-scale NAND memory devices. The set of circuits includes high voltage switches 622 coupled to the bit lines for use supporting program and erase operations on the bit lines. Also, the set of circuits includes bit line input drivers 621 for logic inputs during the computation mode which are arranged to apply voltages to the bit lines, including bit line voltages corresponding to the input values of a sum-of-products operation. The set of circuits also includes page buffers 620 used to define a set of data for a program operation, used to store data during program verify operations, and used in the read operations for transfer of data out of the memory array. Also, the page buffers 620 can be utilized to as part of the logic path to select voltage levels to be applied during the computation mode on the plurality of bit lines of a given sum-of-products operation.
The page buffers 620, bit line input drivers 621, and high voltage switches 622 can be arranged in a plurality of vertical lanes, such that all of the NAND blocks in a vertical lane share the same bit lines, and associated circuitry. In one example implementation, the array 600 of NAND blocks can comprise 8K bit lines, configured into eight 1K lanes of bit lines, coupled to a column of eight NAND blocks in this example.
Sense amplifiers 614 are coupled to the array 600 of NAND blocks. There can be for example 64 current sensing sense amplifiers, each coupled to one of the 64 NAND blocks in the plurality of NAND blocks of the array 600. In other embodiments, the set of sense amplifiers 614 can include one sense amplifier for each horizontal lane of the array 600. In other embodiments, the set of sense amplifiers 614 can include one sense amplifier for each vertical lane.
As illustrated in
The integrated circuit device 800 includes a set of source lines 855 coupled to corresponding NAND blocks in the array 860, and a set of bit lines 865 coupled to corresponding NAND blocks in the array 860.
A set of word lines is coupled to gates of the memory cells at corresponding levels of the NAND blocks, signals on the word lines in the set of word lines selecting respective levels of memory cells. Word line drivers 840 are coupled to a set of word lines 845.
A set of sensing circuits 850 is coupled to respective source lines in the set of source lines. For sum-of-products operations using the array, the source line sensing circuits 850 can sense current at source lines 855 from the memory array 860. Currents sensed at a particular source line in the set of source lines can represent a sum-of-products as discussed above. Sensed data from the source line sensing circuits 850 are supplied via data lines 853 to input/output circuits 891.
Bit line drivers in circuits 870 are coupled to page buffer 875, and to bit lines 865. For sum-of-products operations using the array, bit line drivers in circuits 870 can produce an input x(m) for each selected bit line.
Addresses are supplied on bus 830 from control logic (controller) 810 to page buffer 875, bit line drivers in circuits 870 and word line drivers 840. Page buffer 875, bit line drivers in circuits 870, and source line sensing circuits 850 are coupled to the input/output circuits 891 by lines 853, 885.
Input/output circuits 891 drive the data to destinations external to the integrated circuit device 800. Input/output data and control signals are moved via data bus 805 between the input/output circuits 891, the control logic 810 and input/output ports on the integrated circuit device 800 or other data sources internal or external to the integrated circuit device 800, such as a general purpose processor or special purpose application circuitry, or a combination of modules providing system-on-a-chip functionality supported by the memory array 860.
In the example shown in
The control logic 810 can be implemented using special-purpose logic circuitry as known in the art. In alternative embodiments, the control logic comprises a general-purpose processor, which can be implemented on the same integrated circuit, which executes a computer program to control the operations of the device. In yet other embodiments, a combination of special-purpose logic circuitry and a general-purpose processor can be utilized for implementation of the control logic.
Control logic 810 can also implement circuitry to support pipeline operations in a computation mode of the device. For example, the following table shows pipeline operations for three blocks with SRAM supported by logic to receive and operate on the outputs of the blocks.
The first integrated circuit 900 and second integrated circuit 950 can be mounted in a stacked chip configuration, in a multichip module or in other compact configurations, in some embodiments, in which the interconnect lines in the bus system 910, 911, 912 comprise inter-chip connections including for example through silicon vias, ball grid connections, and so on.
Thus, to compute a sum-of-products for a number B of product terms, the method includes selecting a NAND block coupled to B bit lines, a number S string select lines, and having a number L of word line levels (1100). To set up the computation, operating the NAND block in a memory mode, the method includes storing coefficient data w(i-j) for the product term W(i)*X(i) in a set of memory cells at a level L(k) of a column C(i) of NAND strings on a string select line SSL(j) coupled to bit line BL(i), for the index i of bit lines going from 1 to B, and the index j of SSL lines going from 1 to S, the index k for the word line levels being held at a constant selected level. Using a NAND page programming operation, the coefficient data can be stored in the memory cells on NAND strings coupled to a single SSL line SSL(i). The NAND page programming operation is repeated for each of the SSL lines in the set of SSL lines SSL(1) to SSL(S) (1101).
In this manner, the coefficient W(i) is represented by the number S of data values (w(i-j) for j going from 1 to S) stored in the set of memory cells at a word line level k in the column C(i) of NAND strings coupled to the string select switches on string select lines SSL(1) to (S).
In general, the number S can be equal to the physical number of string select lines. In some embodiments, the number S used to store particular coefficient can be any number from 1 up to the maximum physical number of string select lines. Likewise, the number B of bit lines corresponds to the number of product terms to be summed in the NAND block. The number B can be equal to the maximum physical number of bit lines coupled to the block, or to a number of bit lines selected for a particular number of product terms.
Once the data is stored, the device can enter a computation mode. In the computation mode the inputs X(i) represented by bit line voltages VBLi are applied to the bit lines BL(i) in parallel, for i going from 1 to B, and can be simultaneously applied to the bit lines. Also, string select line select voltages, which are set to turn on the string select switches connecting the bit line to the corresponding NAND strings, are simultaneously applied to all of the string select lines SSL(j) in parallel, for j going from 1 to S. Also, word line voltages are applied to the word line or word lines in the word line level k corresponding to the particular product term be computed (1102).
This has the effect of connecting a number S of NAND strings to each of the bit lines BL(i) in parallel, and applying a computation level word line voltage to the word line or word lines at the selected word line level k. At the same time the ground select lines are set to a select voltage level to turn on the ground select switches connecting the NAND strings to the source line. The data value in the cells at the selected word line level has the effect of setting the conductances of the NAND strings according to the coefficients stored.
Also, in the computation mode, the currents through the number S of NAND strings in columns C(1) to C(B) of NAND strings connected to the bit lines BL(1) to BL(B) are combined on the source line for the NAND block (1103).
As a result, the current on the source line corresponds to the sum of the B product terms being computed.
Finally in the computation mode, a sum-of-products output is generated by sensing the current magnitude on the source line (1104). This output can be a multibit digital output that is delivered to a bus system for routing according to the sequence of computations being executed.
The flowchart illustrates logic executed by a memory controller or by a memory device as described herein. The logic can be implemented using processors programmed using computer programs stored in memory accessible to the computer systems and executable by the processors, by dedicated logic hardware, including field programmable integrated circuits, and by combinations of dedicated logic hardware and computer programs. It will be appreciated that many of the steps can be combined, performed in parallel, or performed in a different sequence without affecting the functions achieved. In some cases, as the reader will appreciate, a rearrangement of steps will achieve the same results only if certain other changes are made as well. In other cases, as the reader will appreciate, a rearrangement of steps will achieve the same results only if certain conditions are satisfied. Furthermore, it will be appreciated that the flow chart shows only steps that are pertinent to an understanding of the technology, and it will be understood that numerous additional steps for accomplishing other functions can be performed before, after and between those shown.
In general, a high performance, low power multiply-and-accumulate accelerator circuit has been described. Implementations of the accelerator circuit can comprise 3D NAND flash blocks arranged for high capacity computation of sum-of-products operations. Utilizing the structures described herein, the current magnitudes required for sum-of-products computations can be very small and tightly controlled. In embodiments described herein, thousands and tens of thousands of cells can be operated together to support high throughput, low power sum-of-products operations.
In examples described herein, each block has a separated source line which can collect current from many NAND strings on a plurality of bit lines and string select lines simultaneously. The current on the source line can be coupled to a current sensing sense amplifier having a wide dynamic range, with an average current per block ranging from 10 μA to 100 μA average, having multi-bit sensing (2 bits, 3 bits, 4 bits for example.) The per-string current can be trimmed by setting appropriate threshold voltages and driving voltages to be a small as possible, such as in the range of less than 50 nanoAmperes per string, enabling sensing of 1000 to 10,000 strings in parallel in a single sense amplifier. For example, the word lines can be biased close to the middle between a program verify an erase state of a flash memory cell, and the bit line voltages can be reduced to levels below 0.3 V to make small per-string current.
The number of bit lines used for input in parallel to a NAND block and string select lines used for the purposes of storing the coefficients can be set as desired for a particular implementation. Also, the number of bit lines and string select lines used for a given computation can be logically determined in each cycle of the operation.
Because of the use of multiple string select lines, and thereby multiple NAND strings coupled to a single bit line in parallel, the coefficients or weights of each product term can be distributed into multiple memory cells in the NAND block. This enables high resolution of the coefficients, or effectively analog weight values, since each of the cells can be individually trimmed to a high accuracy using NAND flash programming operations. As a result, the reliability of the computations can be very high.
While the present invention is disclosed by reference to the preferred embodiments and examples detailed above, it is to be understood that these examples are intended in an illustrative rather than in a limiting sense. It is contemplated that modifications and combinations will readily occur to those skilled in the art, which modifications and combinations will be within the spirit of the invention and the scope of the following claims.
This application claims the benefit of U.S. Provisional Patent Application No. 62/791,037, filed 11 Jan. 2019; and also claims benefit of U.S. Provisional Patent Application No. 62/780,938, filed 18 Dec. 2018; which applications are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
4219829 | Dorda et al. | Aug 1980 | A |
4987090 | Hsu et al. | Jan 1991 | A |
5029130 | Yeh | Jul 1991 | A |
5586073 | Hiura et al. | Dec 1996 | A |
6107882 | Gabara et al. | Aug 2000 | A |
6313486 | Kencke et al. | Nov 2001 | B1 |
6829598 | Milev | Dec 2004 | B2 |
6856542 | Roy | Feb 2005 | B2 |
6906940 | Lue | Jun 2005 | B1 |
6960499 | Nandakumar et al. | Nov 2005 | B2 |
7089218 | Visel | Aug 2006 | B1 |
7368358 | Ouyang et al. | May 2008 | B2 |
7436723 | Rinerson et al. | Oct 2008 | B2 |
7747668 | Nomura et al. | Jun 2010 | B2 |
8203187 | Lung et al. | Jun 2012 | B2 |
8275728 | Pino | Sep 2012 | B2 |
8432719 | Lue | Apr 2013 | B2 |
8589320 | Breitwisch et al. | Nov 2013 | B2 |
8630114 | Lue | Jan 2014 | B2 |
8725670 | Visel | May 2014 | B2 |
8860124 | Lue et al. | Oct 2014 | B2 |
9064903 | Mitchell et al. | Jun 2015 | B2 |
9147468 | Lue | Sep 2015 | B1 |
9213936 | Visel | Dec 2015 | B2 |
9379129 | Lue et al. | Jun 2016 | B1 |
9391084 | Lue | Jul 2016 | B2 |
9430735 | Vali et al. | Aug 2016 | B1 |
9431099 | Lee et al. | Aug 2016 | B2 |
9524980 | Lue | Dec 2016 | B2 |
9535831 | Jayasena et al. | Jan 2017 | B2 |
9536969 | Yang et al. | Jan 2017 | B2 |
9589982 | Cheng et al. | Mar 2017 | B1 |
9698156 | Lue | Jul 2017 | B2 |
9698185 | Chen et al. | Jul 2017 | B2 |
9710747 | Kang et al. | Jul 2017 | B2 |
9747230 | Han et al. | Aug 2017 | B2 |
9754953 | Tang et al. | Sep 2017 | B2 |
9767028 | Cheng et al. | Sep 2017 | B2 |
9898207 | Kim et al. | Feb 2018 | B2 |
9910605 | Jayasena et al. | Mar 2018 | B2 |
9978454 | Jung | May 2018 | B2 |
9983829 | Ravimohan et al. | May 2018 | B2 |
9991007 | Lee et al. | Jun 2018 | B2 |
10037167 | Kwon et al. | Jul 2018 | B2 |
10056149 | Yamada et al. | Aug 2018 | B2 |
10073733 | Jain et al. | Sep 2018 | B1 |
10157012 | Kelner et al. | Dec 2018 | B2 |
10175667 | Bang et al. | Jan 2019 | B2 |
10242737 | Lin et al. | Mar 2019 | B1 |
10528643 | Choi et al. | Jan 2020 | B1 |
10534840 | Petti | Jan 2020 | B1 |
10565494 | Henry et al. | Jan 2020 | B2 |
10635398 | Lin et al. | Apr 2020 | B2 |
10643713 | Louie et al. | May 2020 | B1 |
10719296 | Lee et al. | Jul 2020 | B2 |
10777566 | Lue | Sep 2020 | B2 |
10783963 | Hung et al. | Sep 2020 | B1 |
10790828 | Gunter | Sep 2020 | B1 |
10825510 | Jaiswal et al. | Nov 2020 | B2 |
10860682 | Knag et al. | Dec 2020 | B2 |
10942673 | Shafiee Ardestani et al. | Mar 2021 | B2 |
10957392 | Lee et al. | Mar 2021 | B2 |
11410028 | Crill et al. | Aug 2022 | B2 |
20030122181 | Wu | Jul 2003 | A1 |
20050088878 | Eshel | Apr 2005 | A1 |
20050287793 | Blanchet et al. | Dec 2005 | A1 |
20100172189 | Itagaki | Jul 2010 | A1 |
20100182828 | Shima et al. | Jul 2010 | A1 |
20100202208 | Endo et al. | Aug 2010 | A1 |
20110063915 | Tanaka et al. | Mar 2011 | A1 |
20110106742 | Pino | May 2011 | A1 |
20110128791 | Chang et al. | Jun 2011 | A1 |
20110194357 | Han et al. | Aug 2011 | A1 |
20110286258 | Chen et al. | Nov 2011 | A1 |
20110297912 | Samachisa et al. | Dec 2011 | A1 |
20120007167 | Hung et al. | Jan 2012 | A1 |
20120044742 | Narayanan | Feb 2012 | A1 |
20120182801 | Lue | Jul 2012 | A1 |
20120235111 | Osano et al. | Sep 2012 | A1 |
20120254087 | Visel | Oct 2012 | A1 |
20130070528 | Maeda | Mar 2013 | A1 |
20130075684 | Kinoshita et al. | Mar 2013 | A1 |
20140043898 | Kuo | Feb 2014 | A1 |
20140063949 | Tokiwa | Mar 2014 | A1 |
20140119127 | Lung et al. | May 2014 | A1 |
20140149773 | Huang et al. | May 2014 | A1 |
20140268996 | Park | Sep 2014 | A1 |
20140330762 | Visel | Nov 2014 | A1 |
20150008500 | Fukumoto et al. | Jan 2015 | A1 |
20150171106 | Suh | Jun 2015 | A1 |
20150199126 | Jayasena et al. | Jul 2015 | A1 |
20150331817 | Han et al. | Nov 2015 | A1 |
20160141337 | Shimabukuro et al. | May 2016 | A1 |
20160181315 | Lee et al. | Jun 2016 | A1 |
20160232973 | Jung | Aug 2016 | A1 |
20160247579 | Ueda et al. | Aug 2016 | A1 |
20160308114 | Kim et al. | Oct 2016 | A1 |
20160336064 | Seo et al. | Nov 2016 | A1 |
20160342892 | Ross | Nov 2016 | A1 |
20160342893 | Ross et al. | Nov 2016 | A1 |
20160358661 | Vali | Dec 2016 | A1 |
20170003889 | Kim et al. | Jan 2017 | A1 |
20170025421 | Sakakibara et al. | Jan 2017 | A1 |
20170092370 | Harari | Mar 2017 | A1 |
20170103316 | Ross et al. | Apr 2017 | A1 |
20170123987 | Cheng et al. | May 2017 | A1 |
20170148517 | Harari | May 2017 | A1 |
20170160955 | Jayasena et al. | Jun 2017 | A1 |
20170169885 | Tang et al. | Jun 2017 | A1 |
20170169887 | Widjaja | Jun 2017 | A1 |
20170263623 | Zhang et al. | Sep 2017 | A1 |
20170270405 | Kurokawa | Sep 2017 | A1 |
20170309634 | Noguchi et al. | Oct 2017 | A1 |
20170316833 | Ihm et al. | Nov 2017 | A1 |
20170317096 | Shin et al. | Nov 2017 | A1 |
20170337466 | Bayat et al. | Nov 2017 | A1 |
20180113649 | Shafiee Ardestani et al. | Apr 2018 | A1 |
20180121790 | Kim et al. | May 2018 | A1 |
20180129424 | Confalonieri et al. | May 2018 | A1 |
20180129936 | Young et al. | May 2018 | A1 |
20180144240 | Garbin et al. | May 2018 | A1 |
20180157488 | Shu et al. | Jun 2018 | A1 |
20180173420 | Li et al. | Jun 2018 | A1 |
20180189640 | Henry et al. | Jul 2018 | A1 |
20180240522 | Jung | Aug 2018 | A1 |
20180246783 | Avraham et al. | Aug 2018 | A1 |
20180247195 | Kumar et al. | Aug 2018 | A1 |
20180286874 | Kim et al. | Oct 2018 | A1 |
20180321942 | Yu et al. | Nov 2018 | A1 |
20180342299 | Yamada et al. | Nov 2018 | A1 |
20180350823 | Or-Bach et al. | Dec 2018 | A1 |
20190019538 | Li | Jan 2019 | A1 |
20190019564 | Li et al. | Jan 2019 | A1 |
20190035449 | Saida et al. | Jan 2019 | A1 |
20190043560 | Sumbul et al. | Feb 2019 | A1 |
20190065151 | Chen et al. | Feb 2019 | A1 |
20190088329 | Tiwari et al. | Mar 2019 | A1 |
20190102170 | Chen et al. | Apr 2019 | A1 |
20190138892 | Kim et al. | May 2019 | A1 |
20190148393 | Lue | May 2019 | A1 |
20190164044 | Song et al. | May 2019 | A1 |
20190164617 | Tran et al. | May 2019 | A1 |
20190213234 | Bayat et al. | Jul 2019 | A1 |
20190220249 | Lee et al. | Jul 2019 | A1 |
20190244662 | Lee et al. | Aug 2019 | A1 |
20190286419 | Lin et al. | Sep 2019 | A1 |
20190311243 | Whatmough et al. | Oct 2019 | A1 |
20190311749 | Song et al. | Oct 2019 | A1 |
20190325959 | Bhargava et al. | Oct 2019 | A1 |
20190340497 | Baraniuk et al. | Nov 2019 | A1 |
20190363131 | Torng et al. | Nov 2019 | A1 |
20200026993 | Otsuka | Jan 2020 | A1 |
20200065650 | Tran et al. | Feb 2020 | A1 |
20200110990 | Harada et al. | Apr 2020 | A1 |
20200117986 | Burr et al. | Apr 2020 | A1 |
20200118638 | Leobandung et al. | Apr 2020 | A1 |
20200160165 | Sarin | May 2020 | A1 |
20200334015 | Shibata et al. | Oct 2020 | A1 |
20210240945 | Strachan et al. | Aug 2021 | A1 |
Number | Date | Country |
---|---|---|
101432821 | May 2009 | CN |
1998012 | Nov 2010 | CN |
103778468 | May 2014 | CN |
105718994 | Jun 2016 | CN |
105789139 | Jul 2016 | CN |
106530210 | Mar 2017 | CN |
107077879 | Aug 2017 | CN |
107533459 | Jan 2018 | CN |
107767905 | Mar 2018 | CN |
108268946 | Jul 2018 | CN |
2048709 | Apr 2009 | EP |
H0451382 | Feb 1992 | JP |
2006127623 | May 2006 | JP |
2009080892 | Apr 2009 | JP |
201108230 | Mar 2011 | TW |
201523838 | Jun 2015 | TW |
201618284 | May 2016 | TW |
201639206 | Nov 2016 | TW |
201715525 | May 2017 | TW |
201732824 | Sep 2017 | TW |
201741943 | Dec 2017 | TW |
201802800 | Jan 2018 | TW |
201822203 | Jun 2018 | TW |
2012009179 | Jan 2012 | WO |
2012015450 | Feb 2012 | WO |
2016060617 | Apr 2016 | WO |
2017091338 | Jun 2017 | WO |
2018201060 | Nov 2018 | WO |
Entry |
---|
JP Office Action in Application No. JP 2019-166959 dated Nov. 16, 2021 with English Machine Translation, 9 pages. |
EP Extended Search Report from 18155279.5-1203 dated Aug. 30, 2018, 8 pages. |
EP Extended Search Report from EP18158099.4 dated Sep. 19, 2018, 8 pages. |
Jang et al., “Vertical cell array using TCAT(Terabit Cell Array Transistor) technology for ultra high density NAND flash memory,” 2009 Symposium on VLSI Technology, Honolulu, HI, Jun. 16-18, 2009, pp. 192-193. |
Kim et al. “Multi-Layered Vertical Gate NAND Flash Overcoming Stacking Limit for Terabit Density Storage,” 2009 Symposium on VLSI Technology Digest of Papers, Jun. 16-18, 2009, 2 pages. |
Kim et al. “Novel Vertical-Stacked-Array-Transistor (VSAT) for Ultra-High-Density and Cost-Effective NAND Flash Memory Devices and SSD (Solid State Drive)”, Jun. 2009 Symposium on VLSI Technolgy Digest of Technical Papers, pp. 186-187. |
Lue et al., “A Novel 3D AND-type NVM Architecture Capable of High-density, Low-power In-Memory Sum-of-Product Computation for Artificial Intelligence Application,” IEEE VLSI, Jun. 18-22, 2018, 2 pages. |
Ohzone et al., “Ion-Implanted Thin Polycrystalline-Silicon High-Value Resistors for High-Density Poly-Load Static RAM Applications,” IEEE Trans. on Electron Devices, vol. ED-32, No. 9, Sep. 1985, 8 pages. |
Sakai et al., “A Buried Giga-Ohm Resistor (BGR) Load Static RAM Cell,” IEEE Symp. on VLSI Technology, Digest of Papers, Sep. 10-12, 1984, 2 pages. |
Schuller et al., “Neuromorphic Computing: From Materials to Systems Architecture,” US Dept. of Energy, Oct. 29-30, 2015, Gaithersburg, MD, 40 pages. |
Seo et al., “A Novel 3-D Vertical FG NAND Flash Memory Cell Arrays Using the Separated Sidewall Control Gate (S-SCG) for Highly Reliable MLC Operation,” 2011 3rd IEEE International Memory Workshop (IMW), May 22-25, 2011, 4 pages. |
Soudry, et al. “Hebbian learning rules with memristors,” Center for Communication and Information Technologies CCIT Report #840, Sep. 1, 2013, 16 pages. |
Tanaka H., et al., “Bit Cost Scalable Technology with Punch and Plug Process for Ultra High Density Flash Memory,” 2007 Symp. VLSI Tech., Digest of Tech. Papers, pp. 14-15. |
U.S. Office Action in U.S. Appl. No. 15/887,166 dated Jul. 10, 2019, 18 pages. |
U.S. Office Action in U.S. Appl. No. 15/887,166 dated Jan. 30, 2019, 18 pages. |
U.S. Office Action in U.S. Appl. No. 15/922,359 dated Jun. 24, 2019, 8 pages. |
U.S. Office Action in U.S. Appl. No. 16/233,404 dated Oct. 31, 2019, 22 pages. |
U.S. Office Action in related case U.S. Appl. No. 15/873,369 dated May 9, 2019, 8 pages. |
Whang, SungJin et al. “Novel 3-dimensional Dual Control-gate with Surrounding Floating-gate (DC-SF) NAND flash cell for 1Tb file storage application,” 2010 IEEE Int'l Electron Devices Meeting (IEDM), Dec. 6-8, 2010, 4 pages. |
Aritome, et al., “Reliability issues of flash memory cells,” Proc. of the IEEE, vol. 81, No. 5, May 1993, pp. 776-788. |
Guo et al., “Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology,” IEEE Int'l Electron Devices Mtg., San Francisco, CA, Dec. 2-6, 2017, 4 pages. |
Merrikh-Bayat et al., “High-Performance Mixed-Signal Neurocomputing with Nanoscale Flowting-Gate Memory Cell Arrays,” in IEEE Transactions on Neural Netowrks and Learning Systems, vol. 29, No. 10, Oct. 2018, pp. 4782-4790. |
U.S. Office Action in U.S. Appl. No. 16/224,602 dated Nov. 23, 2020, 14 pages. |
U.S. Office Action in U.S. Appl. No. 16/279,494 dated Nov. 12, 2020, 25 pages. |
U.S. Office Action in U.S. Appl. No. 16/359,919 dated Oct. 16, 2020, 13 pages. |
Webopedia, “SoC”, Oct. 5, 2011, pp. 1-2, https://web.archive.org/web/20111005173630/https://www.webopedia.com/ TERM/S/SoC.html (Year: 2011)—See Office Action dated Aug. 17, 2020 in U.S. Appl. No. 16/279,494 for relevance—no month provided by examiner. |
U.S. Office Action in U.S. Appl. No. 16/233,414 dated Apr. 20, 2020, 17 pages. |
Chen et al., “Eyeriss: An Energy-Efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE ISSCC, Jan. 31-Feb. 4, 2016, 3 pages. |
EP Extended Search Report from EP19193290.4 dated Feb. 14, 2020, 10 pages. |
Gonugondla et al., “Energy-Efficient Deep In-memory Architecture for NAND Flash Memories,” IEEE International Symposium on Circuits and Systems (ISCAS), May 27-30, 2018, 5 pages. |
Jung et al, “Three Dimensionally Stacked NAND Flash Memory Technology Using Stacking Single Crystal Si Layers on ILD and TANOS Structure for Beyond 30nm Node,” International Electron Devices Meeting, 2006. IEDM '06, Dec. 11-13, 2006, pp. 1-4. |
Lai et al., “A Multi-Layer Stackable Thin-Film Transistor (TFT) NAND-Type Flash Memory,” Electron Devices Meeting, 2006, IEDM '06 International, Dec. 11-13, 2006, pp. 1-4. |
TW Office Action from TW Application No. 10820980430, dated Oct. 16, 2019, 6 pages (with English Translation). |
U.S. Office Action in U.S. Appl. No. 15/873,369 dated Dec. 4, 2019, 5 pages. |
U.S. Office Action in U.S. Appl. No. 15/922,359 dated Oct. 11, 2019, 7 pages. |
U.S. Office Action in related case U.S. Appl. No. 16/037,281 dated Dec. 19, 2019, 9 pages. |
U.S. Office Action in related case U.S. Appl. No. 16/297,504 dated Feb. 4, 2020, 15 pages. |
Wang et al., “Three-Dimensional NAND Flash for Vector-Matrix Multiplication,” IEEE Trans. on Very Large Scale Integration Systems (VLSI), vol. 27, No. 4, Apr. 2019, 4 pages. |
Anonymous, “Data In The Computer”, May 11, 2015, pp. 1-8, https://web.archive.org/web/20150511143158/https:// homepage.cs.uri .edu/faculty/wolfe/book/Readings/Reading02.htm (Year: 2015)—See Office Action dated Aug. 17, 2020 in U.S. Appl. No. 16/279,494 for relevance—no year provided by examiner. |
Rod Nussbaumer, “How is data transmitted through wires in the computer?”, Aug. 27, 2015, pp. 1-3, https://www.quora.com/ How-is-data-transmitted-through-wires-in-the-computer (Year: 2015)—See Office Action dated Aug. 17, 2020 in U.S. Appl. No. 16/279,494 for relevance—no year provided by examiner. |
Scott Thornton, “What is DRAm (Dynamic Random Access Memory) vs SRAM?”, Jun. 22, 2017, pp. 1-11, https://www .microcontrollertips.com/dram-vs-sram/ (Year: 2017)—See Office Action dated Aug. 17, 2020 in U.S. Appl. No. 16/279,494 for relevance—no year provided by examiner. |
TW Office Action from TW Application No. 10920683760, dated Jul. 20, 2020, 4 pages. |
U.S. Office Action in U.S. Appl. No. 16/233,404 dated Jul. 30, 2020, 20 pages. |
U.S. Office Action in U.S. Appl. No. 16/279,494 dated Aug. 17, 2020, 25 pages. |
Webopedia, “DRAM—dynamic random access memory”, Jan. 21, 2017, pp. 1-3, https://web.archive.org/web/20170121124008/https://www.webopedia.com/TERM/D/DRAM.html (Year: 2017)—See Office Action dated Aug. 17, 2020 in U.S. Appl. No. 16/279,494 for relevance—no year provided by examiner. |
Webopedia, “volatile memory”, Oct. 9, 2017, pp. 1-4, https://web.archive.org/web/20171009201852/https://www.webopedia.com/TERMN/volatile_memory.html (Year: 2017)—See Office Action dated Aug. 17, 2020 in U.S. Appl. No. 16/279,494 for relevance—no year provided by examiner. |
Chen et al., “A Highly Pitch Scalable 3D Vertical Gate (VG) NAND Flash Decoded by a Novel Self-Aligned Independently Controlled Double Gate (IDG) StringSelect Transistor (SSL),” 2012 Symp. on VLSI Technology (VLSIT), Jun. 12-14, 2012, pp. 91-92. |
Choi et al., “Performance Breakthrough in NOR Flash Memory With Dopant-Segregated Schottky-Barrier (DSSB) SONOS Devices,” Jun. 2009 Symposium on VLSITechnology Digest of Technical Papers, pp. 222-223. |
Fukuzumi et al. “Optimal Integration and Characteristics of Vertical Array Devices for Ultra-High Density, Bit-Cost Scalable Flash Memory,” IEEE Dec. 2007, pp. 449-452. |
Hsu et al., “Study of Sub-30nm Thin Film Transistor (TFT) Charge-Trapping (CT) Devices for 3D NAND Flash Application,” 2009 IEEE, Dec. 7-9, 2009, pp. 27.4.1-27.4.4. |
Hubert et al., “A Stacked SONOS Technology, Up to 4 Levels and 6nm Crystalline Nanowires, With Gate-All-Around or Independent Gates (Flash), Suitable for Full 3D Integration,” IEEE 2009, Dec. 7-9, 2009, pp. 27.6.1-27.6.4. |
Hung et al., “A highly scalable vertical gate (VG) 3D NAND Flash with robust program disturb immunity using a novel PN diode decoding structure,” 2011 Symp. on VLSI Technology (VLSIT), Jun. 14-16, 2011, pp. 68-69. |
Katsumata et al., “Pipe-shaped BiCS Flash Memory With 16 Stacked Layers and Multi-Level-Cell Operation for Ultra High Density Storage Devices,” 2009 Symposium on VLSI Technology Digest of Technical Papers, Jun. 16-18, 2009, pp. 136-137. |
Kim et al., “Novel 3-D Structure for Ultra High Density Flash Memory with VRAT (Vertical-Recess-Array-Transistor) and PIPE (Planarized Integration on the same PlanE),” IEEE 2008 Symposium on VLSI Technology Digest of Technical Papers, Jun. 17-19, 2008, pp. 122-123. |
Kim et al., “Three-Dimensional NAND Flash Architecture Design Based on Single-Crystalline STacked ARray,” IEEE Transactions on Electron Devices, vol. 59, No. 1, pp. 35-45, Jan. 2012. |
Lue et al., “A Highly Scalable 8-Layer 3D Vertical-Gate (VG) TFT NAND Flash Using Junction-Free Buried Channel BE-SONOS Device”, 2010 Symposium on VLSI Technology Digest of Technical Papers, pp. 131-132, Jun. 15-17, 2010. |
Nowak et al., “Intrinsic fluctuations in Vertical NAND flash memories,” 2012 Symposium on VLSI Technology, Digest of Technical Papers, pp. 21-22, Jun. 12-14, 2012. |
Tanaka et al., “Bit Cost Scalable Technology with Punch and Plug Process for Ultra High Density Flash Memory,” VLSI Technology, 2007 IEEE Symposium on Jun. 12-14, 2007, pp. 14-15. |
TW Office Action from TW Application No. 10820907820, dated Sep. 22, 2020, 41 pages. |
Wang, Michael, “Technology Trends on 3D-NAND Flash Storage”, Impact 2011, Taipei, dated Oct. 20, 2011, found at www.impact.org.tw/2011/Files/NewsFile/201111110190.pdf. |
Y.X. Liu et al., “Comparative Study of Tri-Gate and Double-Gate-Type Poly-Si Fin-Channel Spli-Gate Flash Memories,” 2012 IEEE Silicon Nanoelectronics Workshop (SNW), Honolulu, HI, Jun. 10-11, 2012, pp. 1-2. |
Liu et al., “Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks,” 55th ACM/ESDA/IEEE Design Automation Conference (DAC), Sep. 20, 2018, 4 pages. |
U.S. Office Action in U.S. Appl. No. 16/169,345 dated Aug. 16, 2022, 5 pages. |
U.S. Office Action in U.S. Appl. No. 16/224,602 dated Mar. 24, 2021, 17 pages. |
U.S. Office Action in U.S. Appl. No. 16/359,919 dated Mar. 3, 2021, 15 pages. |
U.S. Office Action in U.S. Appl. No. 16/450,334 dated Aug. 4, 2022, 20 pages. |
Number | Date | Country | |
---|---|---|---|
20200192971 A1 | Jun 2020 | US |
Number | Date | Country | |
---|---|---|---|
62791037 | Jan 2019 | US | |
62780938 | Dec 2018 | US |