Embodiments described herein relate generally to decoding of an error-correcting code.
An error-correcting code has been used to correct an error in read data from, for example, a nonvolatile semiconductor memory such as a NAND memory. An LDPC (Low Density Parity Check) code as a kind of error-correcting code is known for its high error-correcting capabilities. It is also known that decoding performance of the LDPC code improves in proportion to the length of a codeword. For example, the length of a codeword adopted for a NAND flash memory is on the order of 10 kilobits.
The reliability of NAND read data is typically quantized to 5 or 6 bits in the form of a logarithm likelihood ratio (LLR). That is, a memory (LMEM) to store an LLR of NAND read data needs a large capacity of the codeword length×number of quantization bits. From the viewpoint of cost optimization, therefore, LMEM is generally implemented by using SRAM (Static Random Access Memory). Therefore, assuming implementation of a general LDPC decoder for a NAND memory, the calculation algorithm and hardware thereof are optimized. For example, an LDPC decoder is designed based on a block-based parallel processing scheme that collectively performs memory access to an LLR stored in LMEM using continuous addresses.
The description of embodiments will be provided below with reference to the drawings. The same or similar symbols are attached to elements that are the same or similar to described elements to basically omit a duplicate description.
According to an embodiment, an error correction decoder includes a first storage unit, a second storage unit, a first calculation circuit and a second calculation circuit. The first storage unit stores first reliability information of each of a plurality of bits corresponding to an ECC (Error Correction Code) frame defined by a parity check matrix in which M×N (M and N are integers equal to 2 or greater) blocks are arranged, each of the blocks corresponding to either an invalid block as a zero matrix of p rows×p columns (p is an integer equal to 2 or greater) or a valid block as a nonzero matrix of p rows×p columns. The second storage unit stores second reliability information of each of the plurality of bits. The first calculation circuit reads the first reliability information corresponding to variable nodes belonging to each of one or more valid blocks arranged in a given row group of the parity check matrix from the first storage unit, calculates the second reliability information corresponding to the variable nodes by performing row processing based on the first reliability information, and writes the second reliability information to the second storage unit. The second calculation circuit reads the second reliability information corresponding to variable nodes belonging to each of the one or more valid blocks arranged in the given row group of the parity check matrix from the second storage unit, calculates the first reliability information corresponding to the variable nodes by performing column processing based on the second reliability information, and writes the first reliability information to the first storage unit. The first calculation circuit and the second calculation circuit perform the column processing based on the second reliability information corresponding to variable nodes belonging to each of one or more valid blocks arranged in a first row group and the row processing based on the first reliability information corresponding to variable nodes belonging to one or more valid blocks arranged in a second row group whose processing order is later than that of the first row group in parallel.
An LDPC code is defined by a parity check matrix. An error correction decoder typically corrects an error in LDPC coded data by performing iterative decoding using the parity check matrix.
In general, the row of a parity check matrix is called a check node and the column of a parity check matrix is called a variable node (or a bit node). The row weight means the total number of nonzero elements contained in a row of interest and the column weight means the total number of nonzero elements contained in a column of interest. In a parity check matrix defining a so-called regular LDPC code, the row weight of each row is common and the column weight of each column is also common.
A parity check matrix H1 is exemplified in
A parity check matrix can be represented as a 2-part graph called a Tanner graph. More specifically, a variable node and a check node corresponding to a nonzero element in the parity check matrix are connected by an edge. That is, the total number of edges connected to a variable node is equal to the column weight of the variable node and the total number of edges connected to a check node is equal to the row weight of the check node.
The parity check matrix H1 in
In iterative decoding (ITR), a temporary estimated word is generated based on reliability information of each of a plurality of bits forming an LDPC frame. If the temporary estimated word satisfies a parity check, decoding terminates normally, but if the temporary estimated word does not satisfy a parity check, decoding continues. More specifically, update processing of reliability information called row processing and column processing is performed for all check nodes and variable nodes in each trial of iterative decoding to re-generate a temporary estimated word based on the updated reliability information. If the temporary estimated word does not satisfy the parity check even if the trial count of iterative decoding reaches a predetermined upper limit, decoding is generally forced to terminate (abnormal termination). In the description that follows, iterative decoding means trying to iteratively perform a sequence of processing including all row processing and column processing of a parity check matrix, generation of a temporary estimated word, and parity checks of temporary estimated words for all check nodes.
As the reliability information, reliability information α (called, for example, an extrinsic value or extrinsic information) propagated from a check node to a variable node via an edge and reliability information β (called, for example, an a priori probability, a posteriori probability, probability, or LLR) propagated from a variable node to a check node via an edge are used. Further, a channel value λ depending on read data (or a received signal) corresponding to a variable node is used to calculate the reliability information α and the reliability information β.
Iterative decoding is performed based on, for example, the Sum-Product algorithm, the Min-Sum algorithm or the like. Iterative decoding based on these algorithms can be realized by parallel processing.
However, completely parallel processing in which all processing is parallelized needs a large number of calculation circuits. Specifically, the number of calculation circuits needed for completely parallel processing depends on the length of an LDPC codeword and thus, when the length of an LDPC codeword is long, completely parallel processing is not realistic.
According to so-called partially parallel processing, on the other hand, the circuit scale can be reduced. To realize partially parallel processing, typically M×N (M and N are natural numbers) blocks are arranged in a parity check matrix. Each of these blocks corresponds to a valid block (that is, an identity matrix of p (p is an integer equal to 2 or greater and is also called a block size) rows×p columns or a cyclic shift matrix of the identity matrix of p rows×p columns) or an invalid block (that is, a zero matrix of p rows×p columns). Partially parallel processing on such a parity check matrix can be realized by p calculation circuits regardless of the length of an LDPC codeword.
In partially parallel processing, for example, a parity check matrix shown in
Each block in
In partially parallel processing, input variables of calculation circuits are controlled in accordance with the shift value of a valid block. For example, as shown in
LMEM variables are stored in a variable node storage unit (LMEM) shown in
When the shift value=0, as exemplified in
When the shift value=1, as exemplified in
When the shift value=7, as exemplified in
In general, the rotator needs to perform rotate processing of the rotate value=p−1 at the maximum. If the number of quantization bits of a TMEM variable is “u”, the input/output bit width of the rotator needs to be designed to have u×p bits or more.
Partially parallel processing is generally implemented according to the block-based parallel processing scheme. In the block-based parallel processing scheme, memory access to the LLR stored in LMEM using continuous addresses is performed.
An error correction decoder of the block-based parallel processing scheme is exemplified in
The LLR conversion table 11 inputs ECC (Error Correction Code) frame data corresponding to an LDPC code read from a NAND flash memory (not shown). More specifically, the LLR conversion table 11 sequentially inputs read data of the amount corresponding to the block size in order from the start of the ECC frame data. The LLR conversion table 11 sequentially generates LLR data of the amount corresponding to the block size by converting read data into the LLR. LLR data of the amount corresponding to the block size from the LLR conversion table 11 is sequentially written to the LMEM 12.
The calculation unit 13 reads LLR data (specified by continuous addresses) of the amount corresponding to the block size from the LMEM 12, performs an calculation using the LLR data, and writes an calculation result to the LMEM 12. The calculation unit 13 includes as many calculation circuits as the block size. The calculation unit 13 includes a rotator to perform a calculation in accordance with the shift value of the block. The rotator needs the input/output width of at least the number of quantization bits of the variable to be handled×block size.
After a calculation by the calculation unit 13 is completed, a temporary estimated word based on LLR data is generated at DMEM. If a temporary estimated word satisfies parity checks of all check nodes, correction data corresponding to a data portion of the temporary estimated word is output to a host device (not shown).
As exemplified in
In Loop1 (row processing), various calculations are performed in parallel for variable nodes and check nodes belonging to a valid block to be processed. In Loop1, the valid block to be processed moves in a column direction in turn from the first column group of the i-th row group. For example, β for each variable node belonging to the valid block to be processed is calculated by subtracting a added to LLR in column processing of the last iterative decoding of the block from the LLR corresponding to the variable node. β for each variable node is temporarily written to, for example, LMEM. Further, the minimum value βmin1 and the second smallest value βmin2 are detected with reference to the absolute value of β for each check node belonging to the valid block to be processed and INDEX as identification information of the variable node providing the βmin1 is also detected. In addition, the parity check is conducted for each check node belonging to the valid block to be processed. βmin1, βmin2, INDEX, and a parity check result for each check node are temporarily written to TMEM. Incidentally, βmin1, βmin2, INDEX, and a parity check result for each check node may be updated as processing of valid blocks arranged in the i-th row group progresses. Then, βmin1, βmin2, INDEX, and a parity check result for each check node stored in TMEM when processing of all valid blocks arranged in the i-th row group is completed, are used for subsequent column processing.
In Loop2 (column processing), the LLR for each variable node belonging to the valid block to be processed is updated in parallel. Also in Loop2, the valid block to be processed moves in the column direction in turn from the first column group of the i-th row group. More specifically, β for each variable node written to LMEM in row processing of the last iterative decoding is read and βmin1, βmin2, INDEX, and a parity check result for each check node are read from TMEM. α is added to β of each variable node and an calculation result is written to LMEM as an updated LLR of the variable node. α added to β of each variable node depends on βmin1, βmin2, INDEX, and a parity check result detected in one check node corresponding to the variable node of the valid block to be processed. More specifically, as will be described later, the absolute value of α depends on βmin1, βmin2, INDEX, and identification information of the variable node and the sign of α depends on the sign of β and a parity check result.
After the processing in
More specifically, according to the block-based parallel processing scheme, row processing and column processing proceed as exemplified in
In the block-based parallel processing scheme, row processing and column processing cannot be performed in parallel for the same valid block. This is because, for example, before row processing on all valid blocks arranged in a given row group is completed (that is, βmin1, βmin2, INDEX, and parity check results of all check nodes belonging to the row group are determined), column processing on any valid block arranged in the row group cannot be started. Therefore, β calculated in row processing needs to be written back to LMEM and access congestion of LMEM is caused, resulting in lower throughput.
Further, when the row size of a parity check matrix is two blocks or more, according to the ordinary block-based parallel processing scheme, it is difficult to perform column processing on a valid block arranged in a given row group and row processing on a valid block arranged in the next row group in parallel. This is because, for example, before column processing on a given valid block arranged in a given row group is completed (that is, the LLR of variable nodes belonging to the valid block is updated), row processing on a valid block arranged in the same column as the valid block in the next row group cannot be started.
Thus, as will be described later, an error correction decoder according to the first embodiment performs column processing on one or more valid blocks arranged in a first row group and row processing on one or more valid blocks arranged in a second row group whose processing order is later than the first row group in parallel by controlling at least one of the write order, read order, and read timing of the LLR or using a parity check matrix having a specific structure.
As exemplified in
As exemplified in
In the first stage, the β calculation circuit 106 reads the LLR of variable nodes belonging to n valid blocks from the LMEM 105 and reads the sign of α added to β of these variable nodes in the last column processing on the block from the SMEM 110. n means the aforementioned degree of parallelism. Before starting the second stage for the first valid block of the row group to be processed, the β calculation circuit 106 needs to read βmin1, βmin2, and INDEX of all check nodes belonging to the row group from the TMEM 109.
In the second stage, the β calculation circuit 106 calculates β. In the third stage, the β calculation circuit 106 writes β to the BMEM 107. Further, in the third stage, the minimum value detection and parity check circuit 108 detects βmin1, βmin2 and INDEX for each check node and also conducts the parity check for each check node. After the third stage is completed for all valid blocks arranged in the row group to be processed, the minimum value detection and parity check circuit 108 writes βmin1, βmin2, INDEX, and parity check results of all check nodes of the row group to the TMEM 109 and also outputs βmin1, βmin2, INDEX, and parity check results to the LLR calculation circuit 111.
In the fourth stage, the LLR calculation circuit 111 reads β of variable nodes of n valid blocks from the BMEM 107. In the fifth stage, the LLR calculation circuit 111 calculates the LLR. At this point, the minimum value detection and parity check circuit 108 writes the sign of α added to β to the SMEM 110. In the sixth stage, the LLR calculation circuit 111 writes the LLR to the LMEM 105.
The NAND read data input buffer 101 temporarily stores NAND read data from a NAND flash memory (not shown). The NAND read data has, for example, a parity bit added thereto in ECC frame units by an error correction encoder (not shown). The NAND read data input buffer 101 outputs stored NAND read data to the LLR conversion table 102 when necessary.
The LLR conversion table 102 converts NAND read data from the NAND read data input buffer 101 into reliability information (for example, LLR). The correspondence between NAND read data and reliability information is created in advance by, for example, a statistical technique. The LLR converted by the LLR conversion table 102 is written to the data buffer 103.
The data buffer 103 temporarily stores the LLR from the LLR conversion table 102. The LLR stored in the data buffer 103 is written to the LMEM 105 via the rotator 104.
The LMEM 105 stores the LLR from the data buffer 103. The LLR stored in the LMEM 105 is read in n-block units by the β calculation circuit 106 for row processing. Further, the LLR calculation circuit 111 writes the LLR updated through row processing to the LMEM 105 through the rotator 104. Incidentally, the LMEM 105 needs the storage capacity of the number of quantization bits of the codeword length×LLR or more. From the viewpoint of cost optimization, therefore, the LMEM 105 may be implemented by using SRAM.
The β calculation circuit 106 calculates β of each variable node based on the LLR of each variable node read from the LMEM 105 and belonging to the valid block to be processed, βmin1, βmin2, and INDEX read from the TMEM 109 and detected in the last row processing of the block, and the sign of α read from the SMEM 110 and used in the last column processing of the block.
More specifically, the β calculation circuit 106 may calculate β of each variable node belonging to the valid block to be processed by subtracting a used in the last column processing of the block from the LLR of the variable node. Incidentally, the absolute value of Nmin2 is used as the absolute value of α for a variable node having the same identification information as INDEX. On the other hand, the absolute value of βmin1 is used as the absolute value of a for a variable node having different identification information from INDEX. The β calculation circuit 106 writes calculated β to the BMEM 107 and also outputs the β to the minimum value detection and parity check circuit 108.
The BMEM 107 stores β from the β calculation circuit 106. β stored in the BMEM 107 is read in n-block units processing by the LLR calculation circuit 111 for column.
The minimum value detection and parity check circuit 108 detects the minimum value βmin1 and the second smallest value βmin2 with reference to the absolute value of β calculated by the β calculation circuit 106 for each check node belonging to the valid block to be processed and further detects INDEX as identification information of the variable node providing the βmin1. The minimum value detection and parity check circuit 108 writes βmin1, βmin2, and INDEX to the TMEM 109 and also outputs βmin1, βmin2, and INDEX to the LLR calculation circuit 111.
The minimum value detection and parity check circuit 108 further uses β calculated by the β calculation circuit 106 to conduct the parity check for each check node belonging to the valid block to be processed. The parity check result is used to decide the sign of α added to β of each variable node corresponding to the check node to be processed. More specifically, the minimum value detection and parity check circuit 108 performs an EX-OR operation using sign bits of all β of check nodes to be processed. If the calculation result is 0, the parity check result is OK and if the calculation result is 1, the parity check result is NG. The minimum value detection and parity check circuit 108 writes the parity check result for each check node to the TMEM 109.
The minimum value detection and parity check circuit 108 further decides the sign of α added to β of each variable node in column processing of the row group to be processed. The sign of α can be decided based on the sign of the corresponding β and the parity check result of the corresponding check node. If, for example, the sign of β is 0 (that is, positive) and the parity check result is OK, the sign of α is also decided to be 0. On the other hand, if the sign of β is 0, but the parity check result is NG, the sign of α is decided to be 1 (that is, negative). If the sign of β is 1 and the parity check result is OK, the sign of α is also decided to be 1. On the other hand, if the sign of β is 1, but the parity check result is NG, the sign of α is decided to be 0.
The minimum value detection and parity check circuit 108 may be implemented as separate minimum value detection and parity check circuits. These separate minimum value detection and parity check circuits may be connected in parallel or connected in series.
The TMEM 109 stores various kinds of intermediate value data from the minimum value detection and parity check circuit 108. Various kinds of intermediate value data include, for example, βmin1, βmin2, INDEX, and a parity check result for each check node. Various kinds of intermediate value data stored in the TMEM 109 are read by the β calculation circuit 106 for row processing.
The SMEM 110 stores the sign of α from the minimum value detection and parity check circuit 108. The sign of a stored in the SMEM 110 is read by the β calculation circuit 106 for row processing.
The LLR calculation circuit 111 updates the LLR of each variable node belonging to the valid block to be processed based on β read from the BMEM 107 and βmin1, βmin2, INDEX, and the sign of α from the minimum value detection and parity check circuit 108. More specifically, the LLR may also be calculated by adding α to β.
The absolute value of α can be decided based on βmin1, βmin2, and INDEX. That is, the absolute value of βmin2 is used as the absolute value of α for a variable node having the same identification information as INDEX. On the other hand, the absolute value of βmin1 is used as the absolute value of α for a variable node having different identification information from INDEX.
The LLR calculation circuit 111 writes the updated LLR to the LMEM 105 via the rotator 104 and also writes the updated LLR to the DMEM 113 via the rotator 112.
In the DMEM 113, a sign bit (that is, a temporary estimated word), of the LLR updated by the LLR calculation circuit 111 is stored. The temporary estimated word stored in the DMEM 113 is read by the parity check circuit 114 in each trial of iterative decoding (for example, each time the processing in
The parity check circuit 114 conducts the parity check of a temporary estimated word read from the DMEM 113 using a parity check matrix. If a temporary estimated word satisfies parity checks of all check nodes, correction data corresponding to a data portion of the temporary estimated word is output to the host device (not shown) via the data buffer 116. If correction data is encoded according to a BCH code as an outer code, the correction data may be output to the BCH decoder 115. The BCH decoder 115 generates correction data by BCH-decoding input data and outputs the correction data to the host device (not shown) via the data buffer 116.
As exemplified in
To realize such parallel processing, it is necessary to avoid a collision of write access of the LLR to the LMEM 105 accompanying column processing on one or more valid blocks arranged in the first row group and read access of the LLR from the LMEM 105 accompanying row processing on one or more valid blocks arranged in the second row group. That is, it is impossible to perform read and write processing on the LLR of the same variable node at the same time. Further, before write processing of the LLR accompanying column processing on a valid block arranged in a column group of the first row group is completed, it is impossible to start read processing of the LLR accompanying row processing on a valid block arranged in the same column group of the second row group.
To avoid such an access collision, a restriction described later may be imposed on the structure of a parity check matrix. The restriction may be, for example, to insert at least Z invalid blocks between valid blocks in each column group of the parity check matrix. Z is an integer equal to 1 or greater. In other words, the restriction corresponds to not arranging a plurality of valid blocks consecutively in a row direction in each column group of the parity check matrix. The parity check matrix exemplified in
If the parity check matrix satisfies this restriction, the column group position of one or more valid blocks intended for column processing in any first row group does not overlap with the column group position of one or more valid blocks intended for row processing in the second row group subsequent to the first row group. Therefore, even if the column processing and the row processing are performed in parallel, a collision of access to the LMEM 105 does not occur.
However, from the viewpoint of implementing an error correction decoder, difficulties of imposing the restriction on a portion (particularly, a parity portion) of the parity check matrix can be expected. Thus, an error correction decoder according to the present embodiment may adaptively perform various kinds of scheduling by a scheduler (not shown).
It is assumed that, as exemplified in
Therefore, if, as exemplified in
When such an access collision is expected (that is, the column group position of one or more valid blocks arranged in the first row group overlaps with the column group position of one or more valid blocks arranged in the second row group whose processing order is later than that of the first row group), a scheduler (not shown) may perform simple scheduling. The simple scheduling is equivalent to delaying, when compared with the preset timing, the read timing of the LLR corresponding to the variable node belonging to each of one or more valid blocks arranged in the row block intended for row processing until no access collision occurs. The preset timing is, for example, the read timing when no access collision occurs. According to the example in
Instead of the simple scheduling or in addition to the simple scheduling, the scheduler may perform detailed scheduling. The detailed scheduling is equivalent to changing (interchanging) the write order of a plurality of valid blocks intended for column processing or the read order of a plurality of valid blocks intended for row processing so as to avoid an access collision.
When an access collision occurs in a valid block arranged in the row group intended for column processing (that is, the column group position of the valid block matches the column group position of one of one or more valid blocks arranged in the row group intended for row processing), the detailed scheduling may including changing the write order of the LLR corresponding to the variable node belonging to the valid block such that the order is earlier than the preset order. The preset order is, for example, the write order when no access collision occurs. According to the example in
In the example of
When an access collision occurs in a valid block arranged in the row group intended for row processing (that is, the column group position of the valid block matches the column group position of one of one or more valid blocks arranged in the row group intended for column processing), the detailed scheduling may including changing the read order of the LLR corresponding to the variable node belonging to the valid block such that the order is later than the preset order. The preset order is, for example, the read order when no access collision occurs. According to the example in
According to the example in
An error correction decoder according to the first embodiment performs, as has been described above, column processing on each of one or more valid blocks arranged in the first row group and row processing on each of one or more valid blocks arranged in the second row group whose processing order is later than that of the first row group in parallel. Therefore, according to the error correction decoder, error correction decoding processing of the block-based parallel processing scheme can be performed at high speed.
Incidentally, the BMEM 107 can be deleted from the error correction decoder in
LMEM contained in an error correction decoder according to the aforementioned first embodiment is typically implemented by using 2- (or 4-) port SRAM capable of processing read access and write access at the same time. From the viewpoint of cost reduction, however, implementation of LMEM using 1-port SRAM may be desired.
An error correction decoder exemplified in
An error correction decoder according to the second embodiment includes, as has been described above, BMEM for reading/writing β and also performs column processing on each of one or more valid blocks arranged in the first row group and row processing on each of one or more valid blocks arranged in the second row group whose processing order is later than that of the first row group sequentially. Therefore, according to the error correction decoder, LMEM can be implemented by using 1-port SRAM without loss of the speed of error correction decoding processing of the block-based parallel processing scheme.
At least a portion of processing in each of the above embodiments can be realized by using a general-purpose computer as basic hardware. A program to realize the processing in each of the above embodiments may be provided by being stored in a computer readable storage medium. The program is stored in the storage medium as a file in an installable format or a file in an executable format. The storage medium includes a magnetic disk, an optical disk (CD-ROM, CD-R, DVD and the like), a magneto-optical disk (MO and the like), and a semiconductor memory. Any storage medium that can store a program and can be read by a computer may be used. In addition, the program to realize the processing of each of the above embodiments may be stored on a computer (server) connected to a network such as the Internet to allow a computer (client) to download the program via the network.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
This application claims the benefit of U.S. Provisional Application No. 61/911,115, filed Dec. 3, 2013, the entire contents of which are incorporated herein by reference.