This invention relates generally to converting data from a parallel to a serial format and, more specifically, to techniques for variable rate parallel to serial shift registers.
Solid-state memory capable of nonvolatile storage of charge, particularly in the form of EEPROM and flash EEPROM packaged as a small form factor card, has recently become the storage of choice in a variety of mobile and handheld devices, notably information appliances and consumer electronics products. Unlike RAM (random access memory) that is also solid-state memory, flash memory is non-volatile, retaining its stored data even after power is turned off. In spite of the higher cost, flash memory is increasingly being used in mass storage applications. Conventional mass storage, based on rotating magnetic medium such as hard drives and floppy disks, is unsuitable for the mobile and handheld environment. This is because disk drives tend to be bulky, are prone to mechanical failure and have high latency and high power requirements. These undesirable attributes make disk-based storage impractical in most mobile and portable applications. On the other hand, flash memory, both embedded and in the form of a removable card is ideally suited in the mobile and handheld environment because of its small size, low power consumption, high speed and high reliability features.
EEPROM and electrically programmable read-only memory (EPROM) are non-volatile memory that can be erased and have new data written or “programmed” into their memory cells. Both utilize a floating (unconnected) conductive gate, in a field effect transistor structure, positioned over a channel region in a semiconductor substrate, between source and drain regions. A control gate is then provided over the floating gate. The threshold voltage characteristic of the transistor is controlled by the amount of charge that is retained on the floating gate. That is, for a given level of charge on the floating gate, there is a corresponding voltage (threshold) that must be applied to the control gate before the transistor is turned “on” to permit conduction between its source and drain regions.
The floating gate can hold a range of charges and therefore can be programmed to any threshold voltage level within a threshold voltage window. The size of the threshold voltage window is delimited by the minimum and maximum threshold levels of the device, which in turn correspond to the range of the charges that can be programmed onto the floating gate. The threshold window generally depends on the memory device's characteristics, operating conditions and history. Each distinct, resolvable threshold voltage level range within the window may, in principle, be used to designate a definite memory state of the cell.
The transistor serving as a memory cell is typically programmed to a “programmed” state by one of two mechanisms. In “hot electron injection,” a high voltage applied to the drain accelerates electrons across the substrate channel region. At the same time a high voltage applied to the control gate pulls the hot electrons through a thin gate dielectric onto the floating gate. In “tunneling injection,” a high voltage is applied to the control gate relative to the substrate. In this way, electrons are pulled from the substrate to the intervening floating gate.
The memory device may be erased by a number of mechanisms. For EPROM, the memory is bulk erasable by removing the charge from the floating gate by ultraviolet radiation. For EEPROM, a memory cell is electrically erasable, by applying a high voltage to the substrate relative to the control gate so as to induce electrons in the floating gate to tunnel through a thin oxide to the substrate channel region (i.e., Fowler-Nordheim tunneling.) Typically, the EEPROM is erasable byte by byte. For flash EEPROM, the memory is electrically erasable either all at once or one or more blocks at a time, where a block may consist of 512 bytes or more of memory.
The memory devices typically comprise one or more memory chips that may be mounted on a card. Each memory chip comprises an array of memory cells supported by peripheral circuits such as decoders and erase, write and read circuits. The more sophisticated memory devices also come with a controller that performs intelligent and higher level memory operations and interfacing. There are many commercially successful non-volatile solid-state memory devices being used today. These memory devices may employ different types of memory cells, each type having one or more charge storage element.
One simple embodiment of the split-channel memory cell is where the select gate and the control gate are connected to the same word line as indicated schematically by a dotted line shown in
A more refined embodiment of the split-channel cell shown in
When an addressed memory transistor within an NAND cell is read and verified during programming, its control gate is supplied with an appropriate voltage. At the same time, the rest of the non-addressed memory transistors in the NAND cell 50 are fully turned on by application of sufficient voltage on their control gates. In this way, a conductive path is effective created from the source of the individual memory transistor to the source terminal 54 of the NAND cell and likewise for the drain of the individual memory transistor to the drain terminal 56 of the cell. Memory devices with such NAND cell structures are described in U.S. Pat. Nos. 5,570,315, 5,903,495, 6,046,935.
Memory Array
A memory device typically comprises of a two-dimensional array of memory cells arranged in rows and columns and addressable by word lines and bit lines. The array can be formed according to an NOR type or an NAND type architecture.
NOR Array
Many flash EEPROM devices are implemented with memory cells where each is formed with its control gate and select gate connected together. In this case, there is no need for steering lines and a word line simply connects all the control gates and select gates of cells along each row. Examples of these designs are disclosed in U.S. Pat. Nos. 5,172,338 and 5,418,752. In these designs, the word line essentially performed two functions: row selection and supplying control gate voltage to all cells in the row for reading or programming.
NAND Array
Block Erase
Programming of charge storage memory devices can only result in adding more charge to its charge storage elements. Therefore, prior to a program operation, existing charge in a charge storage element must be removed (or erased). Erase circuits (not shown) are provided to erase one or more blocks of memory cells. A non-volatile memory such as EEPROM is referred to as a “Flash” EEPROM when an entire array of cells, or significant groups of cells of the array, is electrically erased together (i.e., in a flash). Once erased, the group of cells can then be reprogrammed. The group of cells erasable together may consist one or more addressable erase unit. The erase unit or block typically stores one or more pages of data, the page being the unit of programming and reading, although more than one page may be programmed or read in a single operation. Each page typically stores one or more sectors of data, the size of the sector being defined by the host system. An example is a sector of 512 bytes of user data, following a standard established with magnetic disk drives, plus some number of bytes of overhead information about the user data and/or the block in with it is stored.
Read/Write Circuits
In the usual two-state EEPROM cell, at least one current breakpoint level is established so as to partition the conduction window into two regions. When a cell is read by applying predetermined, fixed voltages, its source/drain current is resolved into a memory state by comparing with the breakpoint level (or reference current IREF). If the current read is higher than that of the breakpoint level, the cell is determined to be in one logical state (e.g., a “zero” state). On the other hand, if the current is less than that of the breakpoint level, the cell is determined to be in the other logical state (e.g., a “one” state). Thus, such a two-state cell stores one bit of digital information. A reference current source, which may be externally programmable, is often provided as part of a memory system to generate the breakpoint level current.
In order to increase memory capacity, flash EEPROM devices are being fabricated with higher and higher density as the state of the semiconductor technology advances. Another method for increasing storage capacity is to have each memory cell store more than two states.
For a multi-state or multi-level EEPROM memory cell, the conduction window is partitioned into more than two regions by more than one breakpoint such that each cell is capable of storing more than one bit of data. The information that a given EEPROM array can store is thus increased with the number of states that each cell can store. EEPROM or flash EEPROM with multi-state or multi-level memory cells have been described in U.S. Pat. No. 5,172,338.
In practice, the memory state of a cell is usually read by sensing the conduction current across the source and drain electrodes of the cell when a reference voltage is applied to the control gate. Thus, for each given charge on the floating gate of a cell, a corresponding conduction current with respect to a fixed reference control gate voltage may be detected. Similarly, the range of charge programmable onto the floating gate defines a corresponding threshold voltage window or a corresponding conduction current window.
Alternatively, instead of detecting the conduction current among a partitioned current window, it is possible to set the threshold voltage for a given memory state under test at the control gate and detect if the conduction current is lower or higher than a threshold current. In one implementation the detection of the conduction current relative to a threshold current is accomplished by examining the rate the conduction current is discharging through the capacitance of the bit line.
As can be seen from the description above, the more states a memory cell is made to store, the more finely divided is its threshold window. This will require higher precision in programming and reading operations in order to be able to achieve the required resolution.
U.S. Pat. No. 4,357,685 discloses a method of programming a 2-state EPROM in which when a cell is programmed to a given state, it is subject to successive programming voltage pulses, each time adding incremental charge to the floating gate. In between pulses, the cell is read back or verified to determine its source-drain current relative to the breakpoint level. Programming stops when the current state has been verified to reach the desired state. The programming pulse train used may have increasing period or amplitude.
Prior art programming circuits simply apply programming pulses to step through the threshold window from the erased or ground state until the target state is reached. Practically, to allow for adequate resolution, each partitioned or demarcated region would require at least about five programming steps to transverse. The performance is acceptable for 2-state memory cells. However, for multi-state cells, the number of steps required increases with the number of partitions and therefore, the programming precision or resolution must be increased. For example, a 16-state cell may require on average at least 40 programming pulses to program to a target state.
Factors Affecting Read/Write Performance and Accuracy
In order to improve read and program performance, multiple charge storage elements or memory transistors in an array are read or programmed in parallel. Thus, a logical “page” of memory elements are read or programmed together. In existing memory architectures, a row typically contains several interleaved pages. All memory elements of a page will be read or programmed together. The column decoder will selectively connect each one of the interleaved pages to a corresponding number of read/write modules. For example, in one implementation, the memory array is designed to have a page size of 532 bytes (512 bytes plus 20 bytes of overheads.) If each column contains a drain bit line and there are two interleaved pages per row, this amounts to 8512 columns with each page being associated with 4256 columns. There will be 4256 sense modules connectable to read or write in parallel either all the even bit lines or the odd bit lines. In this way, a page of 4256 bits (i.e., 532 bytes) of data in parallel are read from or programmed into the page of memory elements. The read/write modules forming the read/write circuits 170 can be arranged into various architectures.
Referring to
As mentioned before, conventional memory devices improve read/write operations by operating in a massively parallel manner on all even or all odd bit lines at a time. This architecture of a row consisting of two interleaved pages will help to alleviate the problem of fitting the block of read/write circuits. It is also dictated by consideration of controlling bit-line to bit-line capacitive coupling. A block decoder is used to multiplex the set of read/write modules to either the even page or the odd page. In this way, whenever one set bit lines are being read or programmed, the interleaving set can be grounded to minimize immediate neighbor coupling.
However, the interleaving page architecture is disadvantageous in at least three respects. First, it requires additional multiplexing circuitry. Secondly, it is slow in performance. To finish read or program of memory cells connected by a word line or in a row, two read or two program operations are required. Thirdly, it is also not optimum in addressing other disturb effects such as field coupling between neighboring charge storage elements at the floating gate level when the two neighbors are programmed at different times, such as separately in odd and even pages.
The problem of neighboring field coupling becomes more pronounced with ever closer spacing between memory transistors. In a memory transistor, a charge storage element is sandwiched between a channel region and a control gate. The current that flows in the channel region is a function of the resultant electric field contributed by the field at the control gate and the charge storage element. With ever increasing density, memory transistors are formed closer and closer together. The field from neighboring charge elements then becomes significant contributor to the resultant field of an affected cell. The neighboring field depends on the charge programmed into the charge storage elements of the neighbors. This perturbing field is dynamic in nature as it changes with the programmed states of the neighbors. Thus, an affected cell may read differently at different time depending on the changing states of the neighbors.
The conventional architecture of interleaving page exacerbates the error caused by neighboring floating gate coupling. Since the even page and the odd page are programmed and read independently of each other, a page may be programmed under one set of condition but read back under an entirely different set of condition, depending on what has happened to the intervening page in the meantime. The read errors will become more severe with increasing density, requiring a more accurate read operation and coarser partitioning of the threshold window for multi-state implementation. Performance will suffer and the potential capacity in a multi-state implementation is limited.
United States Patent Publication No. US-2004-0060031-A1 discloses a high performance yet compact non-volatile memory device having a large block of read/write circuits to read and write a corresponding block of memory cells in parallel. In particular, the memory device has an architecture that reduces redundancy in the block of read/write circuits to a minimum. Significant saving in space as well as power is accomplished by redistributing the block of read/write modules into a block read/write module core portions that operate in parallel while interacting with a substantially smaller sets of common portions in a time-multiplexing manner. In particular, data processing among read/write circuits between a plurality of sense amplifiers and data latches is performed by a shared processor.
Therefore there is a general need for high performance and high capacity non-volatile memory. In particular, there is a need for a compact non-volatile memory with enhanced read and program performance having an improved processor that is compact and efficient, yet highly versatile for processing data among the read/writing circuits.
A first set of aspects relate to a memory circuit that includes an array of non-volatile memory cells formed along multiple word lines and multiple columns. The columns are subdivided into N divisions, each division formed of a plurality of contiguous columns, and where the word lines span all of the columns of the first array. The memory circuit also includes N sets of access circuitry, each connectable to the columns of a corresponding division of the first array. (N is an integer greater than one.) A deserializer circuit is connected to a data bus to receive data in a word-wide serial data format and connectable to the sets of access circuitry to transfer the received data to it, where the deserializer circuit transfers each of N words of data to a corresponding one of the first plurality of sets of access circuitry in parallel according to a first clock signal. Column redundancy circuitry is connected to the deserializer circuit to provide it with defective column information. In converting data from a serial to a parallel format, the deserializer circuit skips words of the data in the parallel format based on the defective column information indicating that the columns location corresponds to a defective column.
An additional set of aspects concern a memory circuit that includes an array of non-volatile memory cells formed along multiple word lines and multiple columns. The columns are subdivided into N divisions, each division formed of a plurality of contiguous columns, and where the word lines span all of the columns of the first array. The memory circuit also includes N sets of access circuitry, each connectable to the columns of a corresponding division of the first array. (N is an integer greater than one.) A serializer circuit is connected to the sets of access circuitry to receive in parallel each of N words of data from a corresponding one of the sets of access circuitry and is connected to a data bus to transfer to it the received data in a word-wide serial data format according to a first clock signal. Column redundancy circuitry is connected to the serializer circuit to provide it with defective column information. In converting data from a parallel to a serial format, the serializer circuit skips words of the data in the parallel format based on the defective column information indicating that the column's location corresponds to a defective column.
Various aspects, advantages, features and embodiments of the present invention are included in the following description of exemplary examples thereof, which description should be taken in conjunction with the accompanying drawings. All patents, patent applications, articles, other publications, documents and things referenced herein are hereby incorporated herein by this reference in their entirety for all purposes. To the extent of any inconsistency or conflict in the definition or use of terms between any of the incorporated publications, documents or things and the present application, those of the present application shall prevail.
The control circuitry 310 cooperates with the read/write circuits 370 to perform memory operations on the memory array 300. The control circuitry 310 includes a state machine 312, an on-chip address decoder 314 and a power control module 316. The state machine 312 provides chip level control of memory operations. The on-chip address decoder 314 provides an address interface between that used by the host or a memory controller to the hardware address used by the decoders 330 and 370. The power control module 316 controls the power and voltages supplied to the word lines and bit lines during memory operations.
The entire bank of partitioned read/write stacks 400 operating in parallel allows a block (or page) of p cells along a row to be read or programmed in parallel. Thus, there will be p read/write modules for the entire row of cells. As each stack is serving k memory cells, the total number of read/write stacks in the bank is therefore given by r=p/k. For example, if r is the number of stacks in the bank, then p=r*k. One example memory array may have p=512 bytes (512×8 bits), k=8, and therefore r=512. In the preferred embodiment, the block is a run of the entire row of cells. In another embodiment, the block is a subset of cells in the row. For example, the subset of cells could be one half of the entire row or one quarter of the entire row. The subset of cells could be a run of contiguous cells or one every other cell, or one every predetermined number of cells.
Each read/write stack, such as 400-1, essentially contains a stack of sense amplifiers 212-1 to 212-k servicing a segment of k memory cells in parallel. A preferred sense amplifier is disclosed in United States Patent Publication No. 2004-0109357-A1, the entire disclosure of which is hereby incorporated herein by reference.
The stack bus controller 410 provides control and timing signals to the read/write circuit 370 via lines 411. The stack bus controller is itself dependent on the memory controller 310 via lines 311. Communication among each read/write stack 400 is effected by an interconnecting stack bus 431 and controlled by the stack bus controller 410. Control lines 411 provide control and clock signals from the stack bus controller 410 to the components of the read/write stacks 400-1.
In the preferred arrangement, the stack bus is partitioned into a SABus 422 for communication between the common processor 500 and the stack of sense amplifiers 212, and a DBus 423 for communication between the processor and the stack of data latches 430.
The stack of data latches 430 comprises of data latches 430-1 to 430-k, one for each memory cell associated with the stack The I/O module 440 enables the data latches to exchange data with the external via an I/O bus 231.
The common processor also includes an output 507 for output of a status signal indicating a status of the memory operation, such as an error condition. The status signal is used to drive the gate of an n-transistor 550 that is tied to a FLAG BUS 509 in a Wired-Or configuration. The FLAG BUS is preferably precharged by the controller 310 and will be pulled down when a status signal is asserted by any of the read/write stacks. (The isolation latch IL 529 is discussed in the following section on bad column management.)
The input logic 510 receives data from the PBUS and outputs to a BSI node as a transformed data in one of logical states “1”, “0”, or “Z” (float) depending on the control signals from the stack bus controller 410 via signal lines 411. A Set/Reset latch, PLatch 520 then latches BSI, resulting in a pair of complementary output signals as MTCH and MTCH*.
The output logic 530 receives the MTCH and MTCH* signals and outputs on the PBUS 505 a transformed data in one of logical states “1”, “0”, or “Z” (float) depending on the control signals from the stack bus controller 410 via signal lines 411.
At any one time the common processor 500 processes the data related to a given memory cell. For example,
The PBUS 505 of the common processor 500 has access to the SA latch 214-1 via the SBUS 422 when a transfer gate 501 is enabled by a pair of complementary signals SAP and SAN. Similarly, the PBUS 505 has access to the set of data latches 430-1 via the DBUS 423 when a transfer gate 502 is enabled by a pair of complementary signals DTP and DTN. The signals SAP, SAN, DTP and DTN are illustrated explicitly as part of the control signals from the stack bus controller 410.
In the case of the PASSTHROUGH mode where BSI is the same as the input data, the signals ONE is at a logical “1”, ONEB<0> at “0” and ONEB<1> at “0”. This will disable the pull-up or pull-down but enable the transfer gate 522 to pass the data on the PBUS 505 to the output 523. In the case of the INVERTED mode where BSI is the invert of the input data, the signals ONE is at “0”, ONEB<0> at “1” and ONE<1> at “1”. This will disable the transfer gate 522. Also, when PBUS is at “0”, the pull-down circuit will be disabled while the pull-up circuit is enabled, resulting in BSI being at “1”. Similarly, when PBUS is at “1”, the pull-up circuit is disabled while the pull-down circuit is enabled, resulting in BSI being at “0”. Finally, in the case of the FLOATED mode, the output BSI can be floated by having the signals ONE at “1”, ONEB<0> at “1” and ONEB<1> at “0”. The FLOATED mode is listed for completeness although in practice, it is not used.
One feature of the invention is to constitute the pull-up circuits with PMOS transistors and the pull-down circuits with NMOS transistors. Since the pull by the NMOS is much stronger than that of the PMOS, the pull-down will always overcome the pull-up in any contentions. In other words, the node or bus can always default to a pull-up or “1” state, and if desired, can always be flipped to a “0” state by a pull-down.
In the FLOATED mode, all four branches are disabled. This is accomplished by having the signals PINV=1, NINV=0, PDIR=1, NDIR=0, which are also the default values. In the PASSTHROUGH mode, when MTCH=0, it will require PBUS=0. This is accomplished by only enabling the pull-down branch with n-transistors 535 and 536, with all control signals at their default values except for NDIR=1. When MTCH=1, it will require PBUS=1. This is accomplished by only enabling the pull-up branch with p-transistors 533 and 534, with all control signals at their default values except for PINV=0. In the INVERTED mode, when MTCH=0, it will require PBUS=1. This is accomplished by only enabling the pull-up branch with p-transistors 531 and 532, with all control signals at their default values except for PDIR=0. When MTCH=1, it will require PBUS=0. This is accomplished by only enabling the pull-down branch with n-transistors 537 and 538, with all control signals at their default values except for NINV=1. In the PRECHARGE mode, the control signals settings of PDIR=0 and PINV=0 will either enable the pull-up branch with p-transistors 531 and 531 when MTCH=1 or the pull-up branch with p-transistors 533 and 534 when MTCH=0.
Common processor operations are developed more fully in U.S. patent publication number US-2006-0140007-A1, which is hereby incorporated in its entirety by this reference.
Column Redundancy Circuitry
Non-volatile memories, such as those described in the preceding sections, often have failures on the column related circuitry, which can show up as bit line shorts, open bit lines, and data latch read/write failures, for example. As the scale of memory devices decrease, while the lengths of bit lines and word lines grow, such memory circuits become more susceptible to bit line and word line failures. (Methods of detecting and dealing with defective word lines are discussed in US patent publication and application numbers: US-2012-0008405-A1; US-2012-0008384-A1; US-2012-0008410-A1; Ser. No. 13/193,148; Ser. No. 13/332,780; and Ser. No. 13/411,115.) To reliably store user data, it needs to be written into and accessed from good columns, instead of bad columns. These bad columns need to be ignored and/or replaced during memory data input and output operations. This section presents a column redundant circuit to reduce circuit size and improve performance. User data is grouped in an interleaved manner so that data belonging to consecutive logical address will be distributed into different physical locations. For example, all column data can be physically grouped into, say, 5 divisions and user data can be written into or accessed from one division after another consecutively. Each division has its own clock control. The column redundancy block can generate bad column locations' information and send it to control logic to switch the user clock to a different division clock, thereby skipping bad columns. By controlling the clocks for different columns, the user can directly access good columns without touching bad columns.
A number of previous approaches are known for addressing defective columns, some of which are discussed in U.S. Pat. Nos. 6,985,388; 7,170,802; 7,663,950; 7,974,124; US patent publication number US-2011-0002169-A1; and U.S. patent application Ser. No. 13/420,961 filed Mar. 15, 2012. For example, in some memory designs, a number of spare columns are set aside and the column redundancy circuits use spare columns to replace the defective columns, so that when a defective column is to be accessed, it is remapped to a replacement form the set of spares. This solution has the drawback that as the spare columns can also have defects, these will in turn need other spare columns to repair. High speed operation is also a concern in such an arrangement. In another arrangement, the external controller stores the bad column locations and ignores that columns' data. This solution requires the controller to read from memory during power-on and retrieve any bad column data. When the number of bad columns increases, the unused data (bad column) input/output can reduce the effective data performance. Therefore, memory circuits could benefit from better column redundancy circuitry, particularly if implemented inside the memory circuit and in a way that can be transparent to the controller so the performance is not adversely affected.
The arrangement presented in this section divides the physical columns evenly into a number of sub-divisions, where the exemplary embodiment uses 5 such divisions.
In the exemplary embodiment, each column has one word or two bytes. An example of column data arrangement shown in
A division is only selected when its corresponding clock goes high (CLK<i> for division i). If there is no bad column in the array, the clocks will run consecutively from Div0 to Div4 and repeat. An example of user clock and internal divisions' individual clock's timing relationship shows in
The arrangement shown in
In the example of
The listing of bad column addresses can be stored outside of the peripheral circuit in non-volatile memory, such as in a fusible ROM or even the memory array itself, depending on the embodiment. In the exemplary embodiment, the bad columns are determined and set at test time, such as part of a built in self-test (BIST) process, although other embodiments could subsequently update the listing.
Using this arrangement, the memory circuit can use the bad columns addresses to take the user clock inputs and generate the individual clocks for the different divisions. This allows the memory circuit to automatically skip bad columns and access only the good columns for user data, without the need to assign spare columns like conventional scheme and without impacting performance by bad columns. The arrangements of this section are developed further in U.S. patent application Ser. No. 13/463,422.
Variable Rate Parallel to Serial Shift Registers
The preceding discussion has looked at ways of transferring data to and from a non-volatile memory that can have defective columns that need to be skipped. In a read operation, this mean that data is retrieved from the array in a parallel format before being sent out on a data bus in serial formant. In a write operation, the data comes in serially on the bus and then is transferred to the column latches in parallel. Consequently, the read and write operations respectively use parallel-to-serial and serial-to-parallel data shift registers; and, due to the need to skip bad columns, in both cases these are shift registers of variable rate. The next section considers the case of variable rate serial to parallel shift registers further, while this section looks at the parallel to serial case. Specifically, this section looks techniques involving borrow data to reduce worst case timing in variable rate parallel to serial shift registers. A subsequent section will relate both of these cases back to the memory array structures described above.
In a variable rate parallel to serial shift register, there are some locations in the parallel shift register that are to be skipped. For instance, this would be the case for bad columns when the parallel data is being read out an array such as in the preceding section. As the serial clock shifts data out of the serial SR at a steady rate, the parallel clock will need to load in another set of data into the parallel SR sooner. Consequently, the PCLK period is not a fixed cycle, with the PCLK period depending of the number of skipped entries in the previous cycle. Each unit of the serial shift register needs to have a bypass function if it load data that is to be skipped from the parallel shift register. An example of this is illustrated in
The skipped entries will increase the amount of gate delay and signal travel distance in one SCLK cycle. For example, in
Under the arrangement of
To generate the PCLK and control signals to bring this about, so logic is needed and is represented at Logic 721. For the example of a non-volatile memory array, this would again be part of the peripheral and decoding circuitry for the array. The relevant parts for the discussion here include a PCLK generation circuit 725 that receives the serial clock SCLK and the Skip data and then generates PCLK as a multiple of SLCK, where the multiple is the number of entries in 701 that are not to be skipped. The Skip data comes from the memory 731 and in the example of a column based memory array would be the bad column location storage, similar to element 653 in
In this exemplary embodiment, the MUXs are arranged so that an element of SR 703 can receive refresh data from one of, at most, two different BUS data; for example, R1 can receive BUS1 or BUS2, but not BUS3. Consequently, this arrangement reduces the maximum skip needed in the serial shift register by 1. In other embodiments, the arrangement to be set up so to allow more general transfers to further reduce number of skipped elements. This largely a design choice, since a more general MUX arrangement adds complexity to this part of the circuit, but reduces the amount of circuitry needed to skip units of the serial shift register. This decision can be based on how frequently skips are expected. In this case, more than one skip per set of data coming in on the serial bus 705 are infrequent enough that the reducing the maximum skip by 1 is considered a good compromise between increased MUX complexity and serial SR skips. Consequently, although the amount of need skips are reduced, in this embodiment the serial shift register 703 will still need the ability to skip units. This was seen in
For any embodiment employing this technique, by reducing the maximum skip distance in the serial shift register, amount of gate delay and signal travel distance is shortened, reducing or even eliminating the timing bottleneck in the serial SR so higher speed can be achieved.
Serial to Parallel Shift Register
This section looks at the serial to parallel transition, where the latches are closed in a sort of “sliding door” arrangement for fixed or variable rate serial to parallel shift registers. The aspects described in the section can reduce the number of high speed signals need in serial to parallel circuits.
In
In this arrangement, the input boundary signals are SDATA, CLK0˜3 and PCLK in this example. Consequently, there are many high speed clocks with low duty cycle that need to be provided to the latch structures. This will require more routing space on the system to avoid any degradation that otherwise occur for the clock pulse shape. The generation of the CLK0˜N signals depends on how many skip location allowed in one parallel cycle, where the more allowed skips, the more cases that need to be considered in the design. Rather than trying to generate the many different clock signals for each of the latches, the exemplary embodiment here makes each latch clock to default open and to close only when needed. This illustrated in
In
Consequently, as shown at the first falling PCLK, w0 is loaded at all the LAT0˜3. As BAD1 is high, at the next SCLK LATCLK0 and LATCLK1 fall. w1 is then loaded at LAT 2 and LAT3, after which LATCLK 2 falls. w2 is then loaded in at LAT3. When PCLK next falls, all of LAT0˜3 are loaded onto PBUS0˜3. Although w0 is still in LAT1 and thus loaded onto PBUS1, this corresponds to data to be ignored (w##). Note that when the corresponding LATCLK is high, the latch is open and the SDATA will pass through the latch, with the PCLK falling edge taking a snapshot of all the latch elements and put this on the PBUS. Under this arrangement, sometimes the LATCLK2 may not close and the LAT3 is the SDATA.
In the exemplary embodiment, the 1 to 4 serial to parallel maximum skip is two, although further optimization is available to reduce the number of skips needed. In the PCLK cycle, the LATCLK0/1 may not be set to 1. If BAD0 is high, LATCLK0 will not set to 1. If BAD0/1 are both high, LATCLK0/1 will be both 0 after PCLK cycle. This will reduce 30% of the skip cases.
Memory Array Access with Bad Column Information
This section relates the parallel-to-serial and serial-to-parallel arrangements of the last two sections back to their use in transferring data from and to a memory array that is accessed on a column basis, where some of the columns are to be ignored. In this way it is similar to the earlier sections above, but employing the techniques describer with respect to
The diagram of
The ideas of the preceding several sections are developed further in U.S. patent application Ser. Nos. 13/630,163 and 13/630,278.
Centralization of Variable Rate Serializer and Deserializer
As discussed above, data is received onto, and transferred out from, the memory chip from the bus in serial manner, where data comes in a word at a time. (Here the word is taken to be 16 bits, but may more generally be of other sizes.) Data is written into, and read out of, the memory array in pages of data, where the read and write pages are of typically of a much higher degree of parallelism. Once the data is received on the memory circuit in the above arrangements, it is distributed among read/write circuitry of the different divisions (such as 641-0 to 641-4 of
In parallel format data can be transferred at a higher rate. Consequently, once the data is on the memory chip, the earlier data is converted into parallel format, the higher the rate at which data can be transferred within the memory circuit. Referring back to
More specifically, as in the preceding sections, a multiple column is defined as parallel group, and some of the columns in this parallel group could be bad. The parallel group will be read/write with one parallel clock cycle. The column redundant (CRD) block can generate bad column locations' information and send to control logic to control how serial data convert to parallel data. Now, however, the serialization/deserialization is moved in location and, on the parallel end, a double data rate (DDR) type of arrangement is used to further slowdown the frequency and save clock power.
In the embodiment of
The input to the Data sort unit 1063 is the output of the FIFO_out of the FIFO unit 1067. As in
In the embodiment of
Although the various aspects of the present invention have been described with respect to certain embodiments, it is understood that the invention is entitled to protection within the full scope of the appended claims.
This application is a continuation in part of U.S. patent application Ser. No. 13/630,163, filed on Sep. 28, 2012, and is related to U.S. patent application Ser. No. 13/630,278, also filed on Sep. 28, 2012, both of which are hereby incorporated by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
3710348 | Craft | Jan 1973 | A |
3895360 | Cricchi et al. | Jul 1975 | A |
4330862 | Smolik | May 1982 | A |
4357685 | Daniele et al. | Nov 1982 | A |
4426688 | Moxley | Jan 1984 | A |
4720815 | Ogawa | Jan 1988 | A |
4757477 | Nagayama et al. | Jul 1988 | A |
5070032 | Yuan et al. | Dec 1991 | A |
5095344 | Harari | Mar 1992 | A |
5172338 | Mehrotra et al. | Dec 1992 | A |
5200959 | Gross et al. | Apr 1993 | A |
5270979 | Harari et al. | Dec 1993 | A |
5313421 | Guterman et al. | May 1994 | A |
5315541 | Harari et al. | May 1994 | A |
5343063 | Yuan et al. | Aug 1994 | A |
5380672 | Yuan et al. | Jan 1995 | A |
5386390 | Okitaka | Jan 1995 | A |
5418752 | Harari et al. | May 1995 | A |
5428621 | Mehrotra et al. | Jun 1995 | A |
5430679 | Hiltebeitel et al. | Jul 1995 | A |
5430859 | Norman et al. | Jul 1995 | A |
5442748 | Chang et al. | Aug 1995 | A |
5479370 | Furuyama et al. | Dec 1995 | A |
5485425 | Iwai et al. | Jan 1996 | A |
5570315 | Tanaka et al. | Oct 1996 | A |
5595924 | Yuan et al. | Jan 1997 | A |
5596724 | Mullins et al. | Jan 1997 | A |
5602987 | Harari et al. | Feb 1997 | A |
5642312 | Harari | Jun 1997 | A |
5657332 | Auclair et al. | Aug 1997 | A |
5661053 | Yuan | Aug 1997 | A |
5663901 | Wallace et al. | Sep 1997 | A |
5712180 | Guterman et al. | Jan 1998 | A |
5768192 | Eitan | Jun 1998 | A |
5774397 | Endoh et al. | Jun 1998 | A |
5783958 | Lysinger | Jul 1998 | A |
5822245 | Gupta et al. | Oct 1998 | A |
5848009 | Lee et al. | Dec 1998 | A |
5862080 | Harari et al. | Jan 1999 | A |
5890192 | Lee et al. | Mar 1999 | A |
5903495 | Takeuchi et al. | May 1999 | A |
5930167 | Lee et al. | Jul 1999 | A |
5936971 | Harari et al. | Aug 1999 | A |
6011725 | Eitan | Jan 2000 | A |
6021463 | Belser | Feb 2000 | A |
6038167 | Miwa et al. | Mar 2000 | A |
6038184 | Naritake | Mar 2000 | A |
6046932 | Bill et al. | Apr 2000 | A |
6046935 | Takeuchi et al. | Apr 2000 | A |
6091666 | Arase et al. | Jul 2000 | A |
6151248 | Harari et al. | Nov 2000 | A |
6166990 | Ooishi et al. | Dec 2000 | A |
6222762 | Guterman et al. | Apr 2001 | B1 |
6230233 | Lofgren et al. | May 2001 | B1 |
6252800 | Chida | Jun 2001 | B1 |
6266273 | Conley et al. | Jul 2001 | B1 |
6282624 | Kimura et al. | Aug 2001 | B1 |
6353553 | Tamada et al. | Mar 2002 | B1 |
6426893 | Conley et al. | Jul 2002 | B1 |
6456528 | Chen | Sep 2002 | B1 |
6480423 | Toda et al. | Nov 2002 | B2 |
6509851 | Clark et al. | Jan 2003 | B1 |
6510488 | Lasser | Jan 2003 | B2 |
6512263 | Yuan et al. | Jan 2003 | B1 |
6522580 | Chen et al. | Feb 2003 | B2 |
6523132 | Harari et al. | Feb 2003 | B1 |
6560146 | Cernea | May 2003 | B2 |
6567307 | Estakhri | May 2003 | B1 |
6581142 | Jacobs | Jun 2003 | B1 |
6594177 | Matarrese et al. | Jul 2003 | B2 |
6643180 | Ikehashi et al. | Nov 2003 | B2 |
6657891 | Shibata et al. | Dec 2003 | B1 |
6732229 | Leung et al. | May 2004 | B1 |
6771536 | Li et al. | Aug 2004 | B2 |
6813184 | Lee | Nov 2004 | B2 |
6853596 | Cheung | Feb 2005 | B2 |
6870768 | Cernea et al. | Mar 2005 | B2 |
6967873 | Hamilton et al. | Nov 2005 | B2 |
6985388 | Cernea | Jan 2006 | B2 |
6990018 | Tanaka et al. | Jan 2006 | B2 |
6990025 | Kirihata et al. | Jan 2006 | B2 |
6996017 | Scheuerlein et al. | Feb 2006 | B2 |
7027330 | Park | Apr 2006 | B2 |
7039781 | Iwata et al. | May 2006 | B2 |
7057939 | Li et al. | Jun 2006 | B2 |
7058818 | Dariel | Jun 2006 | B2 |
7076611 | Steere et al. | Jul 2006 | B2 |
7110294 | Kawai | Sep 2006 | B2 |
7158421 | Li et al. | Jan 2007 | B2 |
7170802 | Cernea et al. | Jan 2007 | B2 |
7206230 | Li et al. | Apr 2007 | B2 |
7224605 | Moogat et al. | May 2007 | B1 |
7257689 | Baird | Aug 2007 | B1 |
7299314 | Lin et al. | Nov 2007 | B2 |
7310347 | Lasser | Dec 2007 | B2 |
7345928 | Li | Mar 2008 | B2 |
7405985 | Cernea et al. | Jul 2008 | B2 |
7411846 | Terzioglu | Aug 2008 | B2 |
7420847 | Li | Sep 2008 | B2 |
7426623 | Lasser | Sep 2008 | B2 |
7447070 | Cernea | Nov 2008 | B2 |
7490283 | Gorobets et al. | Feb 2009 | B2 |
7493457 | Murin | Feb 2009 | B2 |
7502254 | Murin et al. | Mar 2009 | B2 |
7502259 | Gorobets | Mar 2009 | B2 |
7663950 | Moogat et al. | Feb 2010 | B2 |
7974124 | Chibvongodze et al. | Jul 2011 | B2 |
7983374 | Lei et al. | Jul 2011 | B2 |
8144512 | Huang et al. | Mar 2012 | B2 |
8681548 | Liu et al. | Mar 2014 | B2 |
8730722 | Koh et al. | May 2014 | B2 |
8750042 | Sharon et al. | Jun 2014 | B2 |
8775901 | Sharon et al. | Jul 2014 | B2 |
8842473 | Tsai | Sep 2014 | B2 |
9076506 | Tsai | Jul 2015 | B2 |
20010000023 | Kawahara et al. | Mar 2001 | A1 |
20020118574 | Gongwer et al. | Aug 2002 | A1 |
20030007385 | Hosono et al. | Jan 2003 | A1 |
20030182317 | Kahn et al. | Sep 2003 | A1 |
20030223274 | Cernea | Dec 2003 | A1 |
20040060031 | Cernea | Mar 2004 | A1 |
20040109357 | Cernea et al. | Jun 2004 | A1 |
20040218634 | Peng et al. | Nov 2004 | A1 |
20050073884 | Gonzalez et al. | Apr 2005 | A1 |
20050078517 | Abedifard | Apr 2005 | A1 |
20050144365 | Gorobets et al. | Jun 2005 | A1 |
20050180536 | Payne et al. | Aug 2005 | A1 |
20050213393 | Lasser | Sep 2005 | A1 |
20060126390 | Gorobets et al. | Jun 2006 | A1 |
20060136656 | Conley et al. | Jun 2006 | A1 |
20060140007 | Cernea et al. | Jun 2006 | A1 |
20060161728 | Bennett et al. | Jul 2006 | A1 |
20070061502 | Lasser | Mar 2007 | A1 |
20070065119 | Pomerantz | Mar 2007 | A1 |
20070091677 | Lasser | Apr 2007 | A1 |
20070103977 | Conley et al. | May 2007 | A1 |
20070103978 | Conley et al. | May 2007 | A1 |
20070159652 | Sato | Jul 2007 | A1 |
20070180346 | Murin | Aug 2007 | A1 |
20070186032 | Sinclair et al. | Aug 2007 | A1 |
20070211530 | Nakano | Sep 2007 | A1 |
20070220197 | Lasser | Sep 2007 | A1 |
20070220935 | Cernea | Sep 2007 | A1 |
20070237006 | Murin et al. | Oct 2007 | A1 |
20070260808 | Raines et al. | Nov 2007 | A1 |
20070268745 | Lasser | Nov 2007 | A1 |
20070283081 | Lasser | Dec 2007 | A1 |
20070285980 | Shimizu et al. | Dec 2007 | A1 |
20080062761 | Tu et al. | Mar 2008 | A1 |
20080104309 | Cheon et al. | May 2008 | A1 |
20080104312 | Lasser | May 2008 | A1 |
20080147996 | Jenkins et al. | Jun 2008 | A1 |
20080159012 | Kim | Jul 2008 | A1 |
20080181000 | Lasser | Jul 2008 | A1 |
20080209112 | Yu et al. | Aug 2008 | A1 |
20080244338 | Mokhlesi et al. | Oct 2008 | A1 |
20080244367 | Chin et al. | Oct 2008 | A1 |
20080250220 | Ito | Oct 2008 | A1 |
20080250300 | Mokhlesi et al. | Oct 2008 | A1 |
20080279005 | France | Nov 2008 | A1 |
20080294814 | Gorobets | Nov 2008 | A1 |
20080301532 | Uchikawa et al. | Dec 2008 | A1 |
20080310224 | Roohparvar et al. | Dec 2008 | A1 |
20090067244 | Li | Mar 2009 | A1 |
20090089481 | Kapoor et al. | Apr 2009 | A1 |
20090089520 | Saha et al. | Apr 2009 | A1 |
20090094482 | Zilberman | Apr 2009 | A1 |
20090172498 | Shlick et al. | Jul 2009 | A1 |
20090310408 | Lee et al. | Dec 2009 | A1 |
20100107004 | Bottelli et al. | Apr 2010 | A1 |
20100157641 | Shalvi et al. | Jun 2010 | A1 |
20100172179 | Gorobets et al. | Jul 2010 | A1 |
20100172180 | Paley et al. | Jul 2010 | A1 |
20100174845 | Gorobets et al. | Jul 2010 | A1 |
20100174846 | Paley et al. | Jul 2010 | A1 |
20100174847 | Paley et al. | Jul 2010 | A1 |
20100254198 | Bringivijayaraghavan et al. | Oct 2010 | A1 |
20100287217 | Borchers et al. | Nov 2010 | A1 |
20100325351 | Bennett | Dec 2010 | A1 |
20110002169 | Li et al. | Jan 2011 | A1 |
20110063909 | Komatsu | Mar 2011 | A1 |
20110099460 | Dusija et al. | Apr 2011 | A1 |
20120008384 | Li et al. | Jan 2012 | A1 |
20120008405 | Shah et al. | Jan 2012 | A1 |
20120008410 | Huynh et al. | Jan 2012 | A1 |
20120120733 | Son et al. | May 2012 | A1 |
20140092690 | Tsai | Apr 2014 | A1 |
20140126293 | Tsai et al. | May 2014 | A1 |
Number | Date | Country |
---|---|---|
1549133 | Nov 2004 | CN |
61292747 | Dec 1986 | JP |
01128297 | May 1989 | JP |
06150666 | May 1994 | JP |
WO 9844420 | Oct 1998 | WO |
WO 0049488 | Aug 2000 | WO |
WO 03025939 | Mar 2003 | WO |
WO 03027828 | Apr 2003 | WO |
WO 2006064318 | Jun 2006 | WO |
WO 2007141783 | Dec 2007 | WO |
Entry |
---|
Eitan et al., “NROM: A Novel Localized Trapping, 2-Bit Nonvolatile Memory Cell,” IEEE Electron Device Letters, vol. 21, No. 11, Nov. 2000, pp. 543-545. |
U.S. Appl. No. 61/142,620 entitled “Non-Volatile Memory and Method With Improved Block Management System,” filed Jan. 5, 2009, 144 pages. |
U.S. Appl. No. 12/348,819 entitled “Wear Leveling for Non-Volatile Memories: Maintenance of Experience Count and Passive Techniques,” filed Jan. 5, 2009, 73 pages. |
U.S. Appl. No. 12/348,825 entitled “Spare Block Management in Non-Volatile Memories,” filed Jan. 5, 2009, 76 pages. |
U.S. Appl. No. 12/348,891 entitled “Non-Volatile Memory and Method With Write Cache Partitioning,” filed Jan. 5, 2009, 151 pages. |
U.S. Appl. No. 12/348,895 entitled “Nonvolatile Memory With Write Cache Having Flush/Eviction Methods,” filed Jan. 5, 2009, 151 pages. |
U.S. Appl. No. 12/348,899 entitled “Non-Volatile Memory and Method With Write Cache Partition Management Methods,” filed Jan. 5, 2009, 149 pages. |
U.S. Appl. No. 12/051,462 entitled “Adaptive Algorithm in Cache Operation with Dynamic Data Latch Requirements,” filed Mar. 19, 2008, 20 pages. |
U.S. Appl. No. 12/051,492 entitled “Different Combinations of Wordline Order and Look-Ahead Read to Improve Flash Memory Performance,” filed Mar. 19, 2008, 20 pages. |
U.S. Appl. No. 12/478,997 entitled Folding Data Stored in Binary Format into Multi-State Format Within Non-Volatile Devices, filed Jun. 5, 2009, 52 pages. |
“Numonyx Sector-Based Compact File System (SCFS) Software is a Feature-Rich Flash Solution,” Numonyx, Nov. 3, 2009, 2 pages. |
“SanDisk, Toshiba Develop 32-Nanometer NAND Flash Technology,” SanDisk Corporation and Toshiba Corporation, Feb. 11, 2009, www.physorg.com/news153597019.html, 9 pages. |
U.S. Appl. No. 12/642,740 entitled “Atomic Program Sequence and Write Abort Detection,” filed Dec. 18, 2009, 60 pages. |
U.S. Appl. No. 12/642,584 entitled “Maintaining Updates of Multi-Level Non-Volatile Memory in Binary Non-Volatile Memory,” filed Dec. 18, 2009, 74 pages. |
U.S. Appl. No. 12/642,611 entitled “Non-Volatile Memory with Multi-Gear Control Using On-Chip Folding of Data,” filed Dec. 18, 2009, 74 pages. |
U.S. Appl. No. 12/642,649 entitled “Data Transfer Flows for On-Chip Folding,” filed Dec. 18, 2009, 73 pages. |
Choudhuri et al., “Performance Improvement of Block Based NAND Flash Translation Layer,” CODES + ISSS '07, Salzburg, Austria, Sep. 2007, pp. 257-262. |
Kang et al., “A Superblock-Based Flash Translation Layer for NAND Flash Memory,” EMSOFT'06, Oct. 2006, pp. 161-170. |
Im et al., “Storage Architecture and Software Support for SLC/MLC Combined Flash Memory,” Mar. 2009, ACM, SAC'09, pp. 1664-1669. |
Chang et al., “Real-Time Garbage Collection for Flash-Memory Storage Systems of Real-Time Embedded Systems,” Nov. 2004, ACM, ACM Transactions on Embedded Computing Systems, vol. 3, pp. 837-863. |
Lee et al., “Block Recycling Schemes and Their Cost-Based Optimization in NAND Flash Memory Based Storage System,” Oct. 2007, ACM., EMSOFT '07, pp. 174-182. |
U.S. Appl. No. 13/630,163, filed Sep. 28, 2013, 67 pages. |
U.S. Appl. No. 13/630,278, filed Sep. 28, 2013, 67 pages. |
Number | Date | Country | |
---|---|---|---|
20140126293 A1 | May 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13630163 | Sep 2012 | US |
Child | 14104817 | US |