This application claims the benefit of priority of Singapore patent application No. 10201401824Q, filed 25 Apr. 2014, the content of it being hereby incorporated by reference in its entirety for all purposes.
Various embodiments relate to a decoder for a memory device, a memory device and a method of decoding a memory device.
Emerging non-volatile memory (NVM) devices, including phase change memory (PCM), spin transfer torque magnetoresistive random-access memory (STT-MRAM), resistive random-access memory (ReRAM) and so on, are desired in various applications where high data quality is required. For example, NVM may be used for code storage in handphones and automotive applications, and for data cache in data centres.
However, emerging NVM devices may suffer from data errors for various reasons. NVM may suffer from process variation issues as memory process scales down aggressively. Moreover, each type NVM may have its specific reliability challenges. For example, PCM may have a problem of resistance drift and the drift-induced errors may be imminent over time, therefore multiple bit errors may be expected to be significantly common. STT-MRAM may have intrinsic asymmetry magnetic tunneling junction (MTJ) switching so the write error rate may be much larger for writing bit ‘1’ than that for writing bit ‘0’.
Reliability challenges at device level may be improved at system level by using signal processing and error correction code (ECC) techniques. ECC is commonly employed in semiconductor memory devices. ECC system generally includes encoding and decoding. Encoding is to encode the original data by adding some parity bits and write the codeword to memory cells. Decoding is to find out the errors from the retrieved data read from memory and recover the data stored in memory cells.
Conventionally, Hamming code, a type of ECC with single-error correction and double-error detection (SEC-DED), may be applied in memory devices. However, as memory device has smaller cell size and higher density, stability issues due to process variation may worsen, leading to higher bit error rate. Consequently, a stronger or more effective ECC capable of correcting multiple errors may be or may become indispensable in memory devices.
In addition, emerging NVMs are high-speed memory, so ECC decoder may be expected to have minimum memory access latency overhead. Small decoder area may also be desirable since memory may be significantly sensitive to cost.
Bose-Chaudhuri-Hocquenghem (BCH) code is a powerful ECC technique that is able to correct multiple random errors. BCH code is based on the Galois field (GF) theory and thereby has an algebraic decoding algorithm. BCH code is considerably popular in communication systems, digital video systems, and solid state drives. Generally, BCH decoding may include three pipeline stages, namely, (i) to calculate syndrome vectors from received data; (ii) to determine of error locator polynomial (ELP) from the syndromes; and (iii) to perform Chien search with the ELP to identify error locations. BCH decoding may conventionally be a serial process, involving serial implementation using a number of clock cycles to complete the three stages where the first and third stages may be realized with linear feedback register structure and the second stage may be implemented with an iterative algorithm. Large amount of errors (e.g., error correction capability, t>5) may require the serial implementation of BCH decoding. However, such slow BCH decoding may hardly be applied in high-speed memory devices with access time in the order of tens of nanoseconds, and instead may be used in, e.g., communication and digital television system.
Some techniques for comparatively faster decoding have been developed. For example, a pre-defined look-up table may be employed where syndromes may be used to index the table and each indexed row may directly provide locations of erroneous bits. However, this exemplary technique may usually be limited to double-error correction (DEC) BCH code because the table size may grow excessively large as the number of errors to be corrected increases.
An alternative technique may be to design a full-parallel BCH decoder which may be implemented totally with combinational logic circuitry. Such a parallel implementation may be realized without performing any iteration. However, a shortcoming of this technique may be that in order to achieve low latency, the area of the bit-parallel decoder may be significantly large. This may also affect the length of codeword which is linearly proportional to the area. As such, small amount of errors (e.g., error correction capability, t<5) may be handled by this parallel implementation of BCH decoding, which may be used in optical and memory systems.
In other words, the whole decoder 200 is inserted into the read path with full-parallel implementation as shown in
A BCH code may be a widely used ECC code that is developed on the theory of Galois field (GF) and is able to correct multiple-bit random errors. The BCH code may be characterized by the following parameters: codeword length n, information data length k, error correction capability t, and degree of GF m, in which n=2m−1 and n−k≧mt. A BCH ECC system may include a BCH encoder and a BCH decoder. BCH encoding may be used to encode a k-bit information data into a n-bit codeword with a generator polynomial. Information data vector may be denoted as uk-1, uk-2, . . . u0 and a codeword vector may be denoted as vn-1, vn-2, . . . v0. The according polynomial form may be represented as u(x)=uk-1xk-1+uk-2xk-2 . . . +u0 and v(x)=vn-1xn-1+vn-2xn-2 . . . +v0, respectively. The generator polynomial may be obtained over GF(2m) and represented as g(x)=gn-kxn-k+gn-kxn-k-1 . . . +g0.
For a given BCH(n, k, t) code, the relationship between u(x), g(x), and v(x) may be given by the following equation:
v(x)=u(x)xn-k+(u(x)xn-k)mod g(x) Equation [1]
In memory devices, data encoding may occur during memory write operation. After encoding, a codeword may be written into one page in the memory array.
A typical BCH decoder 200 may include main three modules, namely, a syndrome generator 202, an ELP solver 204, and a Chien search module (or interchangeably referred to as a Chien search circuitry) 206. As shown in
r(x)=v(x)+e(x) Equation [2]
where v(x) is the valid BCH codeword and e(x) indicates the errors in the received vector.
Equation [2] may be performed by a summing circuit 208.
Syndromes may be computed from the received vector using a method to perform a modulo division of r(x) by the minimal polynomial over GF(2m) as shown in Equation [3]:
S
i
=r(x)mod ψi(x) i=1,3,5 . . . 2t−1 Equation [3]
where ψj(x) is the minimal polynomial of element αi over GF(2m).
For binary BCH code, only the odd-index syndromes may need to be computed using the above Equation [3] because the even-index syndromes may be obtained using the following property:
S
2i=(si)2 i=1 . . . t Equation [4]
The syndrome values may indicate whether there are errors in the received data. For example, if all the syndromes are zero, it may be indicated that the received data is a valid codeword and no error exists. Otherwise, if any one syndrome is non-zero, at least one error exists.
The modulus operation in Equation [3] may be typically implemented with a linear feedback shift register (LFSR) structure. The received data may be sent into the LFSR circuit serially. At each clock cycle, the new input received data may be added with the output of the register to produce an intermediate syndrome vector in the registers. The process may be repeated until all the received data are sent into the LFSR, then each bit stored in the registers may be associated with an element in the syndrome vector.
The calculated syndromes may be sent to the ELP solver 204 to determine the coefficients of error-location polynomial as shown in the following:
σ(x)=σ0+σ1x+σ2x2 . . . +σtxt Equation [5]
After the error-location polynomial is determined, the Chien search module 206 may be employed to find out the error locations and correct the errors. The Chien search, named after R. T. Chien, is a search algorithm for determining roots of error locator polynomials (or error-location polynomials) over a Galois field.
Now turning back to
A BCH decoder may usually be implemented with the LFSR structure and an iterative Berlekamp-Massey (BM) algorithm for obtaining the coefficients of error-location polynomial. The BM algorithm is an iterative algorithm which first initializes the coefficients to syndrome values, then computes a discrepancy of current and previous iterations and updates the coefficients in the next iteration according to the discrepancy values. Iterations may be repeated for t times to obtain the final results. Generally, BM algorithm may be implemented with sequential logic circuitry, taking t clock cycles to complete iterations. This iterative algorithm may be suitable for large number of correctable errors t (t>5).
According the above description of the BCH decoding process, the conventional BCH decoder may hardly apply in high-speed memory devices, which may significantly degrade read performance. Although the BCH decoder realized totally with combinational logic may be proposed, it may be limited to double error correction (DEC) BCH code or may have an excessively large area due to bit-parallel Chien search.
Therefore, there is a need to provide an apparatus of a BCH decoder or an improved BCH decoder in memory devices that aims to achieve significantly short (minimum) decoding latency so as to satisfy fast memory read access, as well as minimizes the concomitant increase of gate count so as to save cost of silicon area of semiconductor memory devices, and effectively reduce overall chip cost, thereby addressing at least the problems above.
According to an embodiment, a decoder for a memory device is provided. The decoder may include an error detection circuitry configured to multiply a vector of one or more data words for which an error detection is to be carried out with a parity matrix to determine a plurality of syndrome values and generate a plurality of coefficients from multiplying a syndrome vector with an inverse of a syndrome matrix, wherein both the syndrome vector and the syndrome matrix include the plurality of syndrome values; and an error correction circuitry configured to perform a Chien search on a first part of the plurality of coefficients to determine a first set of error indicators indicating error locations in a first part of the one or more data words, and subsequently perform a Chien search on a second part of the plurality of coefficients to determine a second set of error indicators indicating error locations in a second part of the one or more data words.
According to an embodiment, a memory device is provided. The memory device may include a sense amplifier circuitry configured to provide one or more data words; a decoder including: an error detection circuitry configured to multiply a vector of the one or more data words for which an error detection is to be carried out with a parity matrix to determine a plurality of syndrome values and generate a plurality of coefficients from multiplying a syndrome vector with an inverse of a syndrome matrix, wherein both the syndrome vector and the syndrome matrix include the plurality of syndrome values; and an error correction circuitry configured to perform a Chien search on a first part of the plurality of coefficients to determine a first set of error indicators indicating error locations in a first part of the one or more data words, and subsequently perform a Chien search on a second part of the plurality of coefficients to determine a second set of error indicators indicating error locations in a second part of the one or more data words; and a data register configured to store the one or more data words and the plurality of coefficients, wherein the error detection circuitry is arranged between the sense amplifier circuitry and the data register.
According to an embodiment, a method of decoding a memory device is provided. The method may include multiplying a vector of one or more data words for which an error detection is to be carried out with a parity matrix to determine a plurality of syndrome values; generating a plurality of coefficients from multiplying a syndrome vector with an inverse of a syndrome matrix, wherein both the syndrome vector and the syndrome matrix include the plurality of syndrome values; performing a Chien search on a first part of the plurality of coefficients to determine a first set of error indicators indicating error locations in a first part of the one or more data words; and subsequently performing a Chien search on a second part of the plurality of coefficients to determine a second set of error indicators indicating error locations in a second part of the one or more data words.
In the drawings, like reference characters generally refer to like parts throughout the different views. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various embodiments of the invention are described with reference to the following drawings, in which:
The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the invention. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments.
Embodiments described in the context of one of the methods or devices are analogously valid for the other methods or devices. Similarly, embodiments described in the context of a method are analogously valid for a device, and vice versa.
Features that are described in the context of an embodiment may correspondingly be applicable to the same or similar features in the other embodiments. Features that are described in the context of an embodiment may correspondingly be applicable to the other embodiments, even if not explicitly described in these other embodiments. Furthermore, additions and/or combinations and/or alternatives as described for a feature in the context of an embodiment may correspondingly be applicable to the same or similar feature in the other embodiments.
In the context of various embodiments, the articles “a”, “an” and “the” as used with regard to a feature or element include a reference to one or more of the features or elements.
In the context of various embodiments, the phrase “at least substantially” may include “exactly” and a reasonable variance.
In the context of various embodiments, the term “about” or “approximately” as applied to a numeric value encompasses the exact value and a reasonable variance.
As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
As used herein, the phrase of the form of “at least one of A or B” may include A or B or both A and B. Correspondingly, the phrase of the form of “at least one of A or B or C”, or including further listed items, may include any and all combinations of one or more of the associated listed items.
Various embodiments may provide a low-latency and area-efficient Bose-Chaudhuri-Hocquenghem (BCH) decoder for a non-volatile memory (NVM).
Various embodiments may relate to the field of data error correction in memory devices, and more particularly relates to binary BCH code decoder implementation in memory devices.
Various embodiments may provide a hardware decoder of binary BCH code for a memory device that provides significantly fast decoding speed and relatively low complexity. A BCH decoder architecture may be designed by exploring the unique feature of data flow conversion in a memory read path. The BCH decoder may include two portions, namely, the error detection circuitry and the error correction circuitry. Each portion may be located among a corresponding data path in memory, and may be designed with a specific circuit structure.
The error detection circuitry may include a syndrome generator and an error location polynomial module. The error detection circuitry may be located among a parallel data path between a sense amplifier and a data register in the memory. The error detection circuitry may be totally implemented with combinational logic in a full-parallel manner in order to minimize memory access latency overhead. The error correction circuitry may include an index control circuitry and a Chien search circuitry. The error correction circuitry may be located among a serial data path between the data register and an I/O interface in the memory. The error correction circuitry may be directed towards small area solution in which the Chien search module may be configured as the start search index may be controlled by a memory column address and the number of bits processed per clock cycle may be determined by the I/O port number of the memory device. In other words, the architecture may enable the BCH decoder in accordance with various embodiments to reduce memory access latency as well as silicon area.
In the context of various embodiments, the plurality of syndrome values may indicate a presence of at least one error in the one or more data words, while the plurality of coefficients may indicate the number of errors in the one or more data words. Further, the first set of error indicators may include at least one error indicator indicating at least one error location in the first part of the one or more data words, while the second set of error indicators may include at least one error indicator indicating at least one error location in the second part of the one or more data words. The one or more data words may include a page of read out of a memory array of the memory device in parallel. The one or more data words may be of a 32-byte page size or a 64-byte page size. The first part of the one or more data words may be distinct from the second part of the one or more data words. As such, the first part and the second part of the one or more data words may not overlap each other.
In other words, the error detection circuitry 302 may be configured to parallely process one or more data words to determine the plurality of syndrome values and the plurality of coefficients. “Parallely process” with respect to the one or more data words means to carry out an operation on the one or more data words in its entirety, i.e., on all bits of the one or more data words at at least substantially the same time (e.g., in a parallel manner). The error correction circuitry 304 may be configured to first process one part (e.g., the first part) of the plurality of the coefficients to locate at least one error in the first part of the one or more data words. Once completed, the error correction circuitry 304 may be configured to then process a subsequent part (e.g., the second part) of the plurality of the coefficients to locate at least one error in a subsequent part of the one or more data words. The error correction circuitry 304 may be configured to continue processing further parts of plurality of the coefficients to locate at least one error in each of the further parts of the one or more data words in a similar manner, thereby in effect, serially (sequentially) performing a Chien search on the plurality of the coefficients.
In various embodiments, the error detection circuitry 302 may be arranged along a parallel memory read path of the memory device.
In various embodiments, the error detection circuitry 302 may include a syndrome generator configured to multiply the vector of one or more data words with the parity matrix including elements of a Galois Field to determine the plurality of syndrome values including odd-index syndrome values.
In other words, the parity matrix may include elements of a Galois Field where Galois Fields are expressed as power of a, a being the primitive element over GF(2m), and the plurality of syndrome values may include odd-index syndrome values, e.g., S1, S3, S5, and so on.
In various embodiments, the syndrome generator may further be configured to determine even-index syndrome values S2i based on the odd-index syndrome values S2i-1 and a property of S2i=(si)2 where i=1, . . . t, and t being an error correction capability of the decoder 300. The error correction capability may be an integer value. For example, the error correction capability may be less than or equal to 5.
In various embodiments, the syndrome generator may include a plurality of logic trees, each of the plurality of logic trees configured to receive and process each data word of the one or more data words to generate the plurality of syndrome values at at least substantially the same time.
In the context of various embodiments, the phrase “at least substantially the same time” may mean at least substantially simultaneously.
The logic tree as described herein may include a logic XOR tree. To form an XOR-tree circuit structure, each of the plurality of logic XOR trees may include a combinational arrangement of XOR logic gates and may perform modulo-2 addition of each data word of the vector of one or more data words.
In various embodiments, the syndrome vector may include the plurality of syndrome values or at least part of the plurality of syndrome values. The syndrome matrix may include the plurality of syndrome values or at least part of the plurality of syndrome values.
In various embodiments, the error detection circuitry 302 may further include an error locator polynomial (ELP) solver configured to generate the plurality of coefficients from multiplying the syndrome vector with the inverse of the syndrome matrix, wherein the syndrome vector may further include the even-index syndrome values of S2i where i=1, . . . t; and wherein the syndrome matrix may further include the even-index syndrome values of S2i where i=1, . . . t—1.
It should be appreciated that the syndrome vector is different from the syndrome matrix.
For example, the syndrome vector may include a column vector having a size of A×1, and the syndrome matrix may be an A×A matrix. In this example, for the plurality of coefficients of
the elements in the syndrome vector may be arranged starting from St+1 to S2t in a consecutive order, e.g., the syndrome vector may be
and the syndrome matrix may be
The relationship between the syndrome values and the plurality of coefficients may be based on Newton's identities. It should be appreciated that the syndrome vector and the syndrome matrix may take different forms or arrangements.
In another non-limiting example, for the plurality of coefficients of
the syndrome vector may take a form of
and the syndrome matrix may take a form of
Regardless of the forms or arrangements the syndrome vector and syndrome matrix may take, the plurality of coefficients determined in each situation (or each formulation) would result in the same respective values.
In other embodiments, the error detection circuitry 302 may further include an error locator polynomial (ELP) solver configured to generate the plurality of coefficients by applying a Peterson-Gorenstein-Zierler (PGZ) algorithm on the plurality of syndrome values. The ELP solver may include a plurality of square circuits, each configured to determine a square syndrome value for each of the plurality of syndrome values; and a plurality of process elements configured to generate the plurality of coefficients based on the square syndrome values and the plurality of syndrome values.
For example, each of the plurality of square circuits may include a summing circuit configured to perform an addition of selected syndrome values of the plurality of syndrome values. Further, each of the plurality of process elements may include a combination of XOR logic gates and AND logic gates.
The PGZ algorithm will be described in more details below in relation to Equation [7].
In various embodiments, the error correction circuitry 304 may be arranged along a serial memory read path of the memory device.
In various embodiments, the error correction circuitry 304 may include an index control circuitry configured to receive a column address of the one or more data words to determine a starting search index. The index control circuitry may include a plurality of look-up tables (LUTs) configured to convert the column address to the starting search index.
In various embodiments, the error correction circuitry 304 may further include a Chien search module configured to select from the plurality of coefficients based on the starting search index, the first part of the plurality of coefficients, and to perform the Chien search on the first part of the plurality of coefficients.
In various embodiments, the Chien search module may be configured to determine the first set of error indicators based on roots of an error locator polynomial, wherein the error locator polynomial includes the first part of the plurality of coefficients.
In various embodiments, the Chien search module may further be configured to select from the plurality of coefficients based on the starting search index, the second part of the plurality of coefficients, and to perform the Chien search on the second part of the plurality of coefficients.
In various embodiments, the Chien search module may be configured to determine the second set of error indicators based on roots of an error locator polynomial, wherein the error locator polynomial includes the second part of the plurality of coefficients.
In a Chien search, error is determined to be at a location index i if it has been determined that α−i is a root of the error locator polynomial where α is a primitive element over a Galois field. The Chien search module may have a degree of parallelism determined by the number of input-output (I/O) ports of the memory device. The degree of parallelism may refer to the number of bits processed at each clock cycle by the Chien search module. In various embodiments, the Chien search module may have a degree of parallelism equal to or double the number of input-output (I/O) ports of the memory device. For example, the degree of parallelism may be in a range of 8 bits to 64 bits. A Chien search algorithm will be described in more details below in relation to Equation [8].
In various embodiments, the Chien search module may include a plurality of multipliers configured to multiple the starting search index with the first part of the plurality of coefficients or the second part of the plurality of coefficients.
The Chien search module may further include a plurality of registers configured to store a plurality of multiplication results of the starting search index and the first part of the plurality of coefficients, or a plurality of multiplication results of the starting search index and the second part of the plurality of coefficients.
In context of various embodiments, the term “store” in relation to the plurality of registers in the Chien search module may mean to temporarily store for a subsequent cycle of operation. In other words, the plurality of registers may store the multiplication results for a next cycle of operation.
In various embodiments, the decoder 300 may include a Bose-Chaudhuri-Hocquenghem (BCH) decoder.
A memory device including a decoder according to various embodiments (e.g., the decoder 300 of
The sense amplifier circuitry 322, the error detection circuitry 302 and the data register 324 are in communication with one another, as denoted by a line 326 which may represent electrical coupling, or physical coupling between the sense amplifier circuitry 322 and the error detection circuitry 302, a line 328 which may represent electrical coupling, or physical coupling between the sense amplifier circuitry 322 and the data register 324, and a line 330 which may represent electrical coupling, or physical coupling between the error detection circuitry 302 and the data register 324. The data register 324 and the error correction circuitry 304 are in communication with each other, as denoted by a line 332 which may represent electrical coupling, or physical coupling between the data register 324 and the error correction circuitry 304.
The decoder 300 of
In the context of various embodiments, the one or more data words to be stored in the data register 324 may be referred to as information bits.
In various embodiments, the memory device 320 may further include an input-output (I/O) interface configured to receive or output data into or from the memory device 320, wherein the error correction circuitry 304 may be arranged between the data register 324 and the I/O interface (not shown in
In various embodiments, the memory device 320 may further include an array of memory cells, wherein the sense amplifier circuitry 322 may be further configured to receive signals from the memory cells to generate the one or more data words. For example, the array of memory cells may include a two dimensional array of rows (wordline) and columns (bitline).
The memory device 320 may further include an address control circuitry configured to provide a row address and a column address. The memory device 320 may further include a row decoder configured to receive the row address to activate a wordline of the array of memory cells. The one or more data words may include a page of data based on the row address.
In various embodiments, the error correction circuitry 304 may be configured to receive the first part of the plurality of coefficients or the second part of the plurality of coefficients based on the column address.
The memory device 320 may further include an output control circuitry configured to select the first part of the one or more data words or the second part of the one or more data words based on the column address.
In other words, the error correction circuitry 304 may operate synchronously with the output control circuitry such that the first set of error indicators generated from the error correction circuitry 304 corresponds to the first part of the one or more data words to be corrected, and the second set of error indicators generated from the error correction circuitry corresponds to the second part of the one or more data words to be corrected.
In various embodiments, the memory device 320 may further include an addition module configured to remove at least one error from the first part of the one or more data words based on the first set of error indicators, or from the second part of the one or more data words based on the second set of error indicators.
In various embodiments, the memory device 320 may include a non-volatile memory device. For example, the memory device 320 may include a phase change memory (PCM), a spin transfer torque magnetoresistive random-access memory (STT-MRAM), or a resistive random-access memory (ReRAM).
The memory device may be described in similar context to the memory device 320 of
In
In various embodiments, multiplying the vector of one or more data words with the parity matrix to determine the plurality of syndrome values at 342 may include detecting a presence of at least one error in the one or more data words.
Prior to the step of multiplying the vector of one or more data words with the parity matrix to determine the plurality of syndrome values at 342, the method may further include receiving the one or more data words. The one or more data words may be generated from signals received from memory cells of the memory device.
In various embodiments, the method may include receiving and processing each data word of the one or more data words to generate the plurality of syndrome values at at least substantially the same time.
In various embodiments, multiplying the vector of one or more data words with the parity matrix at 342 may include multiplying the vector of one or more data words with the parity matrix including elements of a Galois Field to determine the plurality of syndrome values including odd-index syndrome values.
The method may further include determining even-index syndrome values S2i based on the odd-index syndrome values S2i-1 and a property of S2i=(si)2 where i=1, . . . t, and t being an error correction capability of the decoder, in accordance with various embodiments.
The syndrome vector may further include the even-index syndrome values of S2i where i=1, . . . t; and the syndrome matrix may further include the even-index syndrome values of S2i where i=1, . . . t−1.
In various embodiments, generating the plurality of coefficients at 344 may include applying a Peterson-Gorenstein-Zierler (PGZ) algorithm on the plurality of syndrome values.
For example, a square syndrome value may be determined for each of the plurality of syndrome values and the plurality of coefficients may be generated based on the square syndrome values and the plurality of syndrome values.
In various embodiments, the method may further include receiving a column address of the one or more data words to determine a starting search index. The column address may be converted to the starting search index through LUTs.
In various embodiments, prior to the step of performing the Chien search on the first part of the plurality of coefficients at 346, the method may include selecting from the plurality of coefficients based on the starting search index, the first part of the plurality of coefficients.
In various embodiments, determining the first set of error indicators at 346 may include determining roots of an error locator polynomial, wherein the error locator polynomial may include the first part of the plurality of coefficients.
In various embodiments, prior to the step of performing the Chien search on the second part of the plurality of coefficients at 348, the method may include selecting from the plurality of coefficients based on the starting search index, the second part of the plurality of coefficient.
In various embodiments, determining the second set of error indicators at 348 may include determining roots of an error locator polynomial, wherein the error locator polynomial may include the second part of the plurality of coefficients.
In various embodiments, performing the Chien search at 346, 348 may include multiplying the starting search index with the first part of the plurality of coefficients or the second part of the plurality of coefficients.
In various embodiments, performing the Chien search at 346, 348 may further include storing a plurality of multiplication results of the starting search index and the first part of the plurality of coefficients, or a plurality of multiplication results of the starting search index and the second part of the plurality of coefficients. The multiplication results may be stored for a next cycle of operation.
In various embodiments, the method may further include storing the one or more data words and the plurality of coefficients.
In various embodiments, the method may further include providing a row address and a column address of memory cells of the memory device. The method may further include selecting the first part of the one or more data words or the second part of the one or more data words based on the column address.
In various embodiments, the method may further include removing at least one error from the first part of the one or more data words based on the first set of error indicators, or from the second part of the one or more data words based on the second set of error indicators. In doing so, an error-free output may be obtained.
While the method described above is illustrated and described as a series of steps or events, it will be appreciated that any ordering of such steps or events are not to be interpreted in a limiting sense. For example, some steps may occur in different orders and/or concurrently with other steps or events apart from those illustrated and/or described herein. In addition, not all illustrated steps may be required to implement one or more aspects or embodiments described herein. Also, one or more of the steps depicted herein may be carried out in one or more separate acts and/or phases.
Examples of the architecture of a Bose-Chaudhuri-Hocquenghem (BCH) decoder in accordance with various embodiments are described as follow.
The decoder 402 of
As seen in
The error detection circuitry 406 may include a syndrome generator circuitry 420 (or may be simply referred to as a syndrome generator) and an error locator polynomial (ELP) solver circuitry 422 (or may be simply referred to as an ELP solver), which are described with reference to
During a memory read operation, an address control circuitry 428 may first produce a row address 430 and a column address 432 of memory cells. The row address 430 may be fed into a row decoder 434 and then a block of data with codeword length may be read out of a memory array 436. In other words, more specially, each memory cell in the memory array 436 may be coupled to a specific wordline (WL) 438 and bitline (BL) 440 that may constitute a specific cell address. All memory cells in the same WL 438 may be referred to as a page. With the row decoder 434, one WL 438 in the memory array 436 may be selected and a page of data (e.g., 32 bytes/64 bytes page size) may be read out of the memory array 436 in parallel.
The sense amplifier circuitry 412 may make a decision on the content of memory cells and may generate an according binary data (or may be referred to as one or more data words). After that, the one or more data words may be sent into two distinct paths A 442 and B 444. Through Path A 442, the information data of the codeword (e.g., the one or more data words) may be stored in an information bits register 446 of the data register 414. As mentioned above, a data parallel-to-serial conversion may exist among the memory read path. Hence, the register 446 may be needed to temporarily store the information data. In the meantime, the one or more data words may be sent to the error detection circuitry 406. The syndrome generator 420 may receive the one or more data words and may generate the syndrome vectors. The syndrome values may indicate whether there are errors in the data. All the syndromes equaling to zero may indicate that the received vector is a valid codeword, otherwise, the presence of non-zero syndromes may indicate that the received vector has errors. After the syndrome generator 420 performs the generation of syndrome vectors, the ELP solver 422 may calculate the coefficients of error location polynomial, which indicates the number of errors in the codeword. The coefficients may be calculated by using the Peterson-Gorenstein-Zierler (PGZ) algorithm and stored in an ELP coefficients register 448 of the data register 414. The error detection circuitry 406 may be implemented totally (entirely) with parallel combinational logic.
Syndromes may be computed from the received vector of one or more data words using a method to multiply the received vector with a parity matrix H as follows:
where α is the primitive element over GF(2m).
All the entries in H are elements of Galois Fields expressed as power of a, which may also be represented as a binary vector.
In other words, the syndromes may be computed by the binary matrix multiplication in Equation [6].
For binary BCH code, only the odd-index syndromes may need to be computed using Equation [6] because the even-index syndromes may be obtained using the property of S2i=(si)2 where i=1, . . . t, as in Equation [4].
As mentioned above, the syndrome values may indicate whether there are errors in the received data. If all the syndromes are zero, it may be indicated that the received data is a valid codeword and no error exists, otherwise, if any one syndrome is non-zero, there are errors.
Syndrome values obtained by using Equation [6] may be the same as those obtained by using Equation [3]. However, the hardware implementation of Equations [3] and [6] may be comparatively different.
Compared to calculation of the remainder in Equation [3], implementation of Equation [6] may be more straightforward. Each element GF(2m) may have an equivalent representation of m-tuple binary vector, hence the H matrix may be expressed as a simple binary matrix. Furthermore, all the element values in the matrix may be pre-determined. As a result, syndrome calculation in Equation [6] may be transformed to modulo-2 addition of the received vector of the one or more data words, that may be simply implemented by XOR combinational logic in hardware.
To obtain the coefficients of error-location polynomial, a Peterson-Gorenstein-Zierler (PGZ) algorithm may be used. In other words, the coefficients may be obtained by directly solving the PGZ equation in Equation [7]:
For a given t, the coefficients may be directly solved from Equation [7]. In contrast with the Berlekamp-Massey (BM) algorithm described above, the PGZ algorithm may remove the iterative process. Furthermore, all the coefficients expressions may be pre-calculated with software tools like Matlab, which may significantly facilitate the hardware implementation. When t is small (t<5), Equation [7] may not be considered as complicated, hence the solutions may be implemented with low complexity. However, when t is large (t>5), the PGZ algorithm may not be considered advantageous because the number of equations may grow rapidly and the expressions of equation solutions may become significantly complex.
The latency of the error detection circuitry 406 may be due to combinational logic propagation delays and no other delays. As a result, the full-parallel implementation of the error detection circuitry 406 may minimize memory access latency overhead.
The data register 414 may contain all the resources prepared for error correction, namely, the one or more data words in the information bits register 446 and the coefficients of ELP in the ELP coefficients register 448. Data error correction and output process may involve the address control circuitry 428, an output control circuitry 450, the index control circuitry 424, the Chien search module 426, and an addition module 452. In early address decoding phase, the address control circuit 428 may send the decoded column address 432 to the output control circuitry 450 and the index control circuitry 424. In the output control circuitry 450, the column address may act as an input index of multiplexer for data selection. In the index control circuitry 424, the column address may be used to generate the start search index for the Chien search module 426 by using a look-up table (LUT).
With command of data output, the output control circuitry 450 may select and output the according portion of data in the information bits register 446 sequentially. The number of data selected per clock cycle may be determined by the number of I/O ports, typically 8 bits to 64 bits. The Chien search circuitry 426 may be synchronously activated with the output control circuitry 450. The Chien search circuitry 426 may receive the start search index from the index control circuitry 424, and may perform a test as represented by Equation (8).
According to the Chien search algorithm, the test at the i-th location of the received vector of the one or more data words is to check whether the following equation is satisfied:
σ(α−i)=0 i=0,1 . . . n—1 Equation [8]
where α is the primitive element over GF(2m).
If α−1 is the root of error locator polynomial, then an error bit may be found at location index i. The Chien search module may carry out enumeration of the received data, that is, to perform Equation (8) from index i=0 to index i=n−1. From Equation (8), it may be observed that the mathematical operations of index i test involves multiplying the coefficients σ1, σ2 . . . σt by α−i, (α−i)2 . . . (α−i)t respectively, and the summation of the results. Circuit complexity may increase linearly with the number of index that is tested simultaneously. Therefore, it may be important to determine whether the index test is conducted in a parallel manner or in a serial manner, which may be significantly dependent on the BCH decoder application.
When the test in Equation [8] is done, the Chien search circuitry 426 may generate the error indicators of the according data locations. In various examples, the degree of parallelism of the Chien search circuitry 426, that is, the number of bits processed at each clock cycle, may be configured as the same to the number of output data from the output control circuitry 450, which may in turn be determined by the number of I/O ports. With such configuration, at each clock cycle, the raw information data from the output control circuitry 450 may at least substantially match or exactly match its according error indicators from the Chien search module 426. The errors may be removed by adding the raw data and its corresponding error indicators in the addition module 452. Finally, a valid word may be send to the I/O circuitry 418.
In another example, the Chien search circuitry 426 may be configured such that the starting search index of Chien search may be generated from the memory column address 432 with the index control circuitry 424. The degree of parallelism for the Chien search module 426 may be equal to the number of output data from the output control circuitry 450, which may, in turn, be determined by the number of memory I/O ports.
Typically, the degree of parallelism for the Chien search module 426 may be equal to number of I/O ports or double the number of I/O ports if double data rate (DDR) interface is used. The principal advantage may be that the Chien search module 426 has a much smaller area due to the limited I/O ports. In addition, the Chien search module 426 may support memory burst read operation because in the Chien search module 426, the intermediate results may be registered and the error indicators output at a next cycle may correspond to that of the next column address.
In contrast with conventional implementation, for example, as shown in
With such an architecture, the memory read access latency overhead due to ECC may be reduced. Since the error correction circuitry 408 may be performed synchronously with data output process, its decoding latency may thus be eliminated or at least minimized. Consequently, the read access overhead may be reduced from the latency of the whole BCH decoder to that of the error detection circuitry 406. The decoder area may also be reduced due to the partial-parallel circuit structure of the Chien search module 426. As a result, both memory access latency and decoder area may be reduced.
The hardware implementation of the coefficient expressions is shown in
Syndrome and square of syndrome are the basic components to implement the coefficient expressions. Operations in Table 1 involve multiplications and additions in a Galois field, which may be implemented in the process elements (PE) 604 in
In an example, read access time overhead may be reduced by more than 30%. Table 4 shows a set of comparison data of read access time overhead using ECC codeword lengths of 16 byte and 32 byte obtained from memory devices in accordance with various embodiments (e.g., implemented with Xilinx virtex-7) and a conventional memory device (e.g., as in
The decoder area for the BCH decoder in accordance with various embodiments may be significantly reduced as compared to that for a conventional decoder. For example, Table 5 shows a set of comparison results of a 16 byte BCH decoder area in accordance with various embodiments and a conventional decoder, both obtained with memory I/O pin number equal to 8, while Table 6 shows a set of comparison results of a 32 byte BCH decoder area in accordance with various embodiments, obtained with the parallel degree of Chien search equal to 8, and a conventional decoder.
It is observed from Tables 5 and 6 that the reduction in decoder area may be mainly contributed by the Chien search module of the BCH decoder in accordance with various embodiments.
A low-latency and area-efficient BCH decoder in accordance with various embodiments may be provided and designed specially for memory.
The BCH decoder may fully take advantage of a unique feature of memory read path, each portion of the BCH decoder being designed associated with a data flow path in the memory and having specific circuit structure. The BCH decoder may achieve comparatively better performance than conventional decoders in terms of reduction in memory access time and reduction of BCH decoder area. The BCH decoder in accordance with various embodiments may be widely used for STT-MRAM, PCM, ReRAM. The error correction capability of the BCH decoder may be less than or equal to 5. The maximum operating frequency of the Chien search engine (or module) may determine the I/O interface the decoder that may be applied. A control signal may be required to activate the index control circuitry and the Chien search engine (or interchangeably referred to as the Chien search module).
While the invention has been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.
Number | Date | Country | Kind |
---|---|---|---|
10201401824Q | Apr 2014 | SG | national |