The present application claims priority to Chinese Patent Application No. 2024100481700, which was filed on Jan. 11, 2024, is titled “MEMORY SYSTEM, OPERATION METHOD OF MEMORY SYSTEM, AND MEMORY CONTROLLER,” and is hereby incorporated herein by reference in its entirety.
The present application relates to the field of memory technology, and in particular to a memory system, an operation method of a memory system, and a memory controller.
A NAND flash memory typically comprises a plurality of blocks, and each block comprises a plurality of pages. A minimum unit for data reading and writing is the page, and a minimum unit for data erasure is the block.
The present application provides a memory system, an operation method of a memory system, and a controller, and may implement an analysis of a cause of a fail page in the memory. The technical solutions are as follows:
In a first aspect, a memory system is provided, which comprises a memory and a controller coupled to the memory, wherein the memory comprises a plurality of blocks and at least one reserved block, each of the plurality of blocks comprises a plurality of pages, and each of the at least one reserved block comprises a plurality of backup pages; the controller is configured to:
In an example, the fail page is a page where a data read error occurs; and the controller is configured to: recover the data in the fail page; and
In an example, the controller is configured to:
In an example, the controller is configured to:
In an example, the fail page is a page where a data write error occurs; and the controller is configured to:
In an example, the controller is further configured to:
In an example, the controller is configured to:
In an example, the controller is configured to:
In an example, the at least one reserved block belongs to a target stripe, and the target stripe comprises at least one factory bad block (FBB).
In a second aspect, an operation method of a memory system is provided, wherein the memory system comprises a controller and a memory, the memory comprises a plurality of blocks and at least one reserved block, each of the plurality of blocks comprises a plurality of pages, and each of the at least one reserved block comprises a plurality of backup pages; the method comprises:
In an example, the fail page is a page where a data read error occurs; backing up, by the controller, the data in the fail page to one backup page in the at least one reserved block comprises: recovering, by the controller, the data in the fail page;
In an example, recovering, by the controller, the data in the fail page comprises:
In an example, recovering, by the controller, the data in the fail page based on the other data comprises:
In an example, the fail page is a page where a data write error occurs; backing up, by the controller, the data in the fail page to one backup page in the at least one reserved block comprises: acquiring, by the controller from an internal memory of the controller, data to be written to the fail page; and
In an example, the method further comprises:
In an example, comparing the raw data with the data in the one backup page in the at least one reserved block comprises:
In an example, backing up, by the controller, the data in the fail page to one backup page in the at least one reserved block comprises:
In a third aspect, a controller is provided, which comprises: a processor and an internal memory, wherein the internal memory is configured to store computer instructions, and the processor is configured to execute the computer instructions to implement the operation method of the memory system provided in the second aspect.
The technical solutions provided by the present application may comprise at least the following advantageous effects:
The present application provides a memory system, an operation method of a memory system, and a controller. In the memory system provided by the present application, the memory comprises a plurality of blocks and at least one reserved block. Upon detecting the fail page among the pages included in the plurality of blocks, the controller of the memory system can acquire the data in the fail page, and back up the data in the fail page to the one backup page in the at least one reserved block. As such, when a failure analysis for the fail page is required, the data can be read directly from the backup page for the failure analysis, so as to avoid leaving a risk in the memory system.
The drawings to be used in description of examples will be briefly introduced below in order to illustrate the technical solutions in the examples of the present application more clearly. Apparently, the drawings described below are only some examples of the present application. Those of ordinary skill in the art may obtain other drawings according to these drawings without creative work.
Examples of the present application are further described below in detail in conjunction with the drawings.
Solutions provided by the examples of the present application are applicable to an electronic apparatus. The electronic apparatus may be a mobile terminal, a desktop computer, a laptop computer, a tablet, a vehicle computer, a gaming console, a printer, a positioning apparatus, a wearable electronic apparatus, a smart sensor, a virtual reality (VR) apparatus, an augmented reality (AR) apparatus, or any other suitable electronic apparatuses having a memory therein.
Examples of the present application further provide a memory system. With continued reference to
In the examples of the present application, the controller 200 may be configured to control operations performed by the memory 100, such as read, erase and program operations. The controller 200 may be further configured to manage various functions with respect to data stored or to be stored in the memory 100, including, but not limited to, bad block management, garbage collection (GC), logical-to-physical address translation, and wear leveling, etc. In an example, the controller 200 may be further configured to process an error correcting code (ECC) with respect to data read from or written to the memory 100. The controller 200 may also perform any other suitable functions, e.g., formatting the memory 100.
The controller 200 may also communicate with an external apparatus according to a communication protocol. In an example, the controller 200 may communicate with an external apparatus through at least one of various interface protocols. The interface protocols may include a Universal Serial Bus (USB) protocol, a Multi-Media Card (MMC) protocol, a Peripheral Component Interconnect (PCI) protocol, a PCI-Express (PCI-E) protocol, an Advanced Technology Attachment (ATA) protocol, a serial ATA protocol, a parallel ATA protocol, a Small Computer System Interface (SCSI), an Enhanced Small Drive Interface (ESDI) protocol, an Integrated Development Environment (IDE) protocol, and a Fire Wire protocol, etc.
In some examples, the controller 200 and one or more memories 100 may be integrated in various types of memory apparatuses.
As an example, as shown in
As another example, as shown in
In order to improve the reliability of data storage, the memory system typically employs RAID striping technology to protect and recover data.
In an example, referring to
When a data read error occurs in any page in the memory, i.e., when a fail page occurs, the controller may read data stored in other pages in the RAID stripe to which the fail page belongs than the fail page, and may recover data of the fail page based on an inter-data exclusive-OR relationship. The data read error may refer to an uncorrectable ECC (UECC) error. It may be understood that since a size of each page in the memory may be greater than the size of the code word, there may be a plurality of RAID stripes to which the fail page belongs, i.e., one fail page may belong to a plurality of RAID stripes. For example, assuming that one page can store 4 code words, one fail page may belong to 4 RAID stripes.
After the fail page is detected, the controller may also label a block to which the fail page belongs as a GBB and trigger a garbage collection process. In the garbage collection process, the controller may rewrite valid data in a RAID stripe to which the GBB belongs to a new memory location, and may crase other blocks in the RAID stripe to which the GBB belongs than the GBB, i.e., the GBB is not involved in the garbage collection process.
Since the above garbage collection process crases data in the other blocks in the RAID stripe to which the GBB belongs than the GBB, an inter-data exclusive-OR relationship of the RAID stripe to which the GBB belongs is destroyed, rendering the data in the fail page in the GBB unrecoverable, i.e., a failure site is lost. After the failure site is lost, an effective analysis of a cause of the GBB cannot be implemented, that is, an effective analysis of a cause of the fail page cannot be implemented.
Examples of the present application provide an operation method of a memory system, wherein the method may back up data in a fail page for a subsequent analysis of a cause of the fail page, thus effectively improving the efficiency and accuracy of the failure analysis. The method is applicable to a memory system such as that shown in any of
As a possible implementation, as shown in
As another possible implementation, the stripe to which the at least one reserved block 102 belongs may also comprise no FBB, that is, the at least one reserved block 102 may be reserved in other manners before leaving the factory. For example, the at least one reserved block 102 may be randomly selected among all available blocks of the memory.
The operation method of a memory system provided by the examples of the present application is described below. As shown in
In operation 501, in response to detecting a fail page among pages included in the plurality of blocks, the controller acquires data in the fail page.
In the examples of the present application, upon detecting a data read error or a data write error in any page in the memory, the controller may determine that the page is a fail page.
In a first possible implementation, when a data write error occurs (i.e., a programming status failed (PSF)) during data writing to any page performed by the controller, the controller may determine that the page is a fail page. Furthermore, the controller may acquire, directly from an internal memory thereof, data to be written to the fail page. The internal memory of the controller may be a static random access memory (SRAM).
In a second possible implementation, if a data read error occurs when reading data from any page performed by the controller, i.e., a data reading UECC error occurs, the controller may determine that the page is a fail page. Furthermore, the controller may recover the data in the fail page to acquire the data in the fail page.
As an example of the second implementation, the data in the fail page may be data subjected to data protection using a RAID technology. Accordingly, after detecting the fail page, the controller may read other data in a RAID stripe to which the fail page belongs than the data in the fail page. Subsequently, the data in the fail page may be recovered based on the read other data.
In an example, referring to
It may be understood that the fail page may comprise a plurality of code words, and if only part of the code words in the fail page are subjected to a UECC error when reading data, the controller may read data of a RAID stripe to which the part of the code words subjected to the UECC error belong, and recover the part of the code words subjected to the UECC error. That is, other code words in the fail page that can be read correctly do not require a data recovery using a RAID stripe.
As another example of the second implementation, the data in the fail page may be data subjected to data protection using a backup strategy. The backup strategy may refer to storing the same piece of data in a plurality of different memory locations (e.g., a plurality of different memory planes) of the memory, so as to ensure that, if data in any of the memory locations is damaged, the data may be read from other memory locations. For example, the backup strategy may be used for data protection of important system data in the memory system, such as a logical-to-physical mapping (L2P) table. In this example, the controller may directly acquire data from a backup memory location corresponding to the fail page, and the acquired data is the data in the fail page.
In operation 502, the controller backs up the data in the fail page to one backup page in the at least one reserved block of the memory.
After acquiring the data in the fail page, the controller may back up the data to the one backup page in the at least one reserved block. As such, failure site data may be retained effectively, thereby increasing the efficiency of the subsequent failure analysis.
Exemplarily, referring to
It may be understood that for each reserved block, the controller may back up the data into the backup pages sequentially in an order of the backup pages in the reserved block. That is, after writing data into a certain backup page and if detecting a fail page again, the controller may write data of the fail page to a next backup page. Accordingly, the above target memory location may refer to a next backup page following a backup page of a last time of writing. It may be also understood that if a memory space of a certain reserved block is full, the controller may continue to back up data in a next reserved block.
On the one hand, if the data in the fail page is the to-be-written data that is acquired by the controller from its internal memory or data recovered based on the RAID stripe, as shown in
On the other hand, if the data in the fail page is acquired by the controller directly from the backup memory location corresponding to the fail page based on the backup strategy, the controller may first perform LDPC decoding, MCRC detection, and de-scrambling for the acquired data. Subsequently, the controller may perform scrambling, MCRC generation, and LDPC coding sequentially for the de-scrambled data, and program it into the backup page.
In an example, in the examples of the present application, the controller may back up the data in the fail page to the one backup page in the at least one reserved block using an SLC mode or a TLC mode. The SLC mode may reduce a probability of a bit flip effectively to ensure the security of data storage. The TLC mode may improve the utilization of a memory space effectively to ensure that a relatively large amount of backup data is written to one reserved block.
With continued reference to
In operation 503, the controller labels a block to which the fail page belongs as a GBB.
In the examples of the present application, after detecting the fail page, the controller may label the block to which the fail page belongs as the GBB.
In operation 504, the controller reads raw data in the fail page after performing garbage collection for the memory.
It may be understood that, as shown in
After performing a garbage collection operation, the controller may read the raw data in the fail page if the analysis of the cause of the GBB, i.e., the failure cause of the fail page, is require. It may be understood that, the controller may read the raw data in the fail page without the processes of LDPC decoding, MCRC detection, and de-scrambling, etc., and no data read error occurs.
In operation 505, the raw data is compared with the data in the one backup page in the at least one reserved block.
In the examples of the present application, the controller may read, from the backup page, the data in the fail page that is backed up by the controller. Subsequently, the data may be compared with the raw data in the fail page for an analysis, so as to clarify the failure cause.
In an example, referring to
In operation 5051, the controller reads the data in the one backup page in the at least one reserved block.
For the GBB to be analyzed, the controller may determine, from the at least one reserved block, the backup page for storing the data of the fail page in the GBB, and read the data in the backup page.
In operation 5052, the controller performs LDPC decoding, MCRC detection, and de-scrambling sequentially for the read data.
Referring to
In operation 5053, the controller generates a scrambling seed based on an address of the fail page.
It may be understood that, when writing the initial data to a page, the controller may randomly generate one scrambling seed based on a location of the page (i.e., a memory location of the initial data), and scramble the initial data based on the scrambling seed. Since different memory locations correspond to different scrambling seeds, there are different scrambled results for the same piece of initial data.
It can be seen that, the same piece of initial data have different stored contents at different memory locations in the memory, that is, the same piece of initial data has a stored content in the fail page that is different from a stored content in the backup page. On this basis, the controller is required to perform a match for the fail page in the GBB, and generate the scrambling seed based on the address of the fail page.
In operation 5054, the controller scrambles de-scrambled data in the backup page based on the scrambling seed.
After generating the scrambling seed, the controller may scramble the de-scrambled data (i.e., the initial data or target write buffer data) in the backup page based on the scrambling seed.
In operation 5055, the controller performs MCRC generation and LDPC coding sequentially for scrambled data, and writes it to a cache of the controller.
The controller may also perform the MCRC generation and LDPC coding sequentially for the scrambled data, so as to acquire data to be written to the fail page. However, the fail page has failed, therefore as shown in
In operation 5056, the raw data is compared with the data stored in the cache.
Since the data stored in the cache is the correct data to be stored in the fail page originally in the fail page, the failure cause of the fail page, i.e., the cause of the GBB, may be determined after analysis by comparing the data stored in the cache with the raw data in the fail page. In an example, a process of the comparison analysis may be performed by a tester or may be implemented by a comparison analysis script, which is not limited in the examples of the present application.
It may be understood that, in most cases there is only one fail page in one GBB, e.g., there is only one UECC page. Assuming that each block in the memory comprises 1392 SLC pages, then one reserved block may store data in fail pages of 1392 GBBs, causing a storage capacity much larger than a normal demand. It may be also understood that, if the storage space of the at least one reserved block that is reserved is full, the controller may stop further writing and perform a debug analysis timely.
For example, for a fail page where a UECC occurs, a data shift error present in the fail page may be detected based on data recovery and comparison processes shown in above operations 5051 to 5056, thus further promoting the analysis to determine that an error occurs in a logic domain of the controller. As such, the efficiency of the failure analysis is improved effectively.
In summary, the examples of the present application provide an operation method of the memory system, wherein the memory of the memory system comprise a plurality of blocks and at least one reserved block. Upon detecting the fail page among the pages included in the plurality of blocks, the controller of the memory system can acquire the data in the fail page, and back up the data in the fail page to the one backup page in the at least one reserved block. As such, when a failure analysis for the fail page is required, the data can be read directly from the backup page for the failure analysis, so as to avoid leaving a risk in the memory system.
The controller 200 may be configured to implement functions of the controller in the above examples to implement functions of the memory system provided by the examples of the present application. The implementations may be referred to the examples shown in
Examples of the present application further provide a computer-readable storage medium having instructions stored thereon, which, when executed by a processor in the controller, may implement any of the operations in the operation method of the memory system provided by the above examples.
Examples of the present application further provide a computer program product containing instructions, which, when executed by a processor of the controller, may implement any of the operations in the operation method of the memory system provided by the above examples.
In the present application, the terms “first” and “second” are only for the purpose of description, and cannot be construed as indicating or implying relative importance. The term “at least one” means one or more, and the term “a plurality of” means two or more, unless otherwise defined clearly.
The above descriptions are merely exemplary examples of the present application, and are not intended to limit the present application. The protection scope of the present application shall be subject to the protection scope of the claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2024100481700 | Jan 2024 | CN | national |