This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2022-193213, filed on Dec. 2, 2022; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to an arithmetic circuitry, a memory system, and a control method.
In a memory system, to protect data to be stored in a memory such as a NAND flash memory, error correction encoded data is stored in the memory. For this reason, when the data stored in the memory is read, the error correction encoded data (also referred to as received word) read from the memory is decoded to restore the data before the error correction encoding.
In the technique related to an error correction code, the multiplication of a Galois field (finite field) may be performed. For example, in decoding a Bose-Chaudhuri-Hocquenghem (BCH) code, which is an example of the error correction code, a syndrome is calculated from a received word (read sequence) read from the memory, and a coefficient of an error locator polynomial is calculated from the syndrome. The syndrome is an element of the Galois field. For this reason, when the coefficient of the error locator polynomial is calculated, the multiplication of the syndrome, that is, the multiplication of the Galois field may be performed. As the number of error-correctable bits (t bits, t is an integer of 2 or more) increases, the number of required multiplications of the Galois field increases. That is, the scale of an arithmetic circuitry (multiplier) used for the multiplication increases.
In general, according to one embodiment, an arithmetic circuitry is configured to: calculate an AND value that is a result of an AND operation of an element a and an element b of a Galois field; and calculate, for each of a plurality of mutually different sets of (u, v), a {circumflex over ( )} (2u)×b {circumflex over ( )} (2v), which is a product of a 2u-th power of a and a 2v-th power of b, from an XOR operation based on the AND value and a connected tensor obtained by collecting a plurality of tensors different for each set.
Exemplary embodiments of an arithmetic circuitry will be described below in detail with reference to the accompanying drawings. The present invention is not limited to the following embodiments. Hereinafter, a memory system including an arithmetic circuitry that performs multiplication of a Galois field when decoding an error correction code will be described as an example. A configuration using the arithmetic circuitry is not limited to this example, and any system (apparatus or device) may be used as the configuration. For example, the arithmetic circuitry described below can also be applied to a memory system that performs the multiplication of the Galois field when calculating an error position, a system that performs the multiplication of the Galois field during cipher processing, and the like.
First, a memory system according to the present embodiment will be described in detail with reference to the drawings.
The non-volatile memory 20 is a non-volatile memory that stores data in a non-volatile manner, and is, for example, a NAND flash memory (hereinafter, simply referred to as a NAND memory). Although a case where a NAND memory is used as the non-volatile memory 20 will be exemplified in the following description, a storage device other than the NAND memory, such as a three-dimensional structure flash memory, a resistive random access memory (ReRAM), or a ferroelectric random access memory (FeRAM), can be used as the non-volatile memory 20. In addition, the non-volatile memory 20 is not necessarily a semiconductor memory, and the present embodiment can also be applied to various storage media other than the semiconductor memory.
The memory system 1 may be various memory systems including the non-volatile memory 20, such as a so-called solid state drive (SSD) or a memory card in which the memory controller 10 and the non-volatile memory 20 are configured as one package.
The memory controller 10 controls writing to the non-volatile memory 20 in accordance with a write request from the host 30. In addition, the memory controller 10 controls reading from the non-volatile memory 20 in accordance with a read request from the host 30. The memory controller 10 is, for example, a semiconductor integrated circuit configured as a system on a chip (SoC). The memory controller 10 includes a host interface (host I/F) 15, a memory interface (memory I/F) 13, a control unit 11, an encoding/decoding unit (CODEC) 14, and a data buffer 12. The host I/F 15, the memory I/F 13, the control unit 11, the encoding/decoding unit 14, and the data buffer 12 are mutually connected by an internal bus 16. Some or all of operations of each component of the memory controller 10 described below may be implemented by a central processing unit (CPU) executing firmware or may be implemented by hardware.
The host I/F 15 erforms processing according to an interface standard with the host 30, and outputs a command received from the host 30, user data to be written, and the like to the internal bus 16. In addition, the host I/F 15 transmits user data read from the non-volatile memory 20 and restored, a response from the control unit 11, and the like to the host 30.
The memory I/F 13 performs write processing to the non-volatile memory 20 based on an instruction from the control unit 11. In addition, the memory I/F 13 performs read processing from the non-volatile memory 20 based on the instruction from the control unit 11.
The control unit 11 integrally controls each component of the memory system 1. When a command is received from the host 30 via the host I/F 15, the control unit 11 performs control according to the command. For example, the control unit 11 instructs the memory I/F 13 to write the user data and a parity to the non-volatile memory 20 in accordance with a command from the host 30. In addition, the control unit 11 instructs the memory I/F 13 to read the user data and the parity from the non-volatile memory 20 in accordance with a command from the host 30.
In addition, when a write request is received from the host 30, the control unit 11 determines a storage area (memory area) on the non-volatile memory 20 for the user data accumulated in the data buffer 12. That is, the control unit 11 manages a write destination of the user data. The correspondence between the logical address of the user data received from the host 30 and the physical address indicating the storage area on the non-volatile memory 20 storing the user data is stored as an address conversion table.
In addition, when a read request is received from the host 30, the control unit 11 converts a logical address designated by the read request into a physical address using the above-described address conversion table, and instructs the memory I/F 13 to perform reading from the physical address.
In the NAND memory, writing and reading are generally performed in data units called pages, and erasing is performed in data units called blocks. In the present embodiment, a plurality of memory cells connected to the same word line is referred to as a memory cell group. When the memory cell is a single level cell (SLC), one memory cell group corresponds to one page. When the memory cell is a multiple level cell (MLC), one memory cell group corresponds to a plurality of pages. In the present description, the MLC includes a triple level cell (TLC), a quad level cell (QLC), and the like. Each memory cell is connected to the word line and is also connected to a bit line. Therefore, each memory cell can be identified by an address for identifying the word line and an address for identifying the bit line.
The data buffer 12 temporarily stores the user data received from the host 30 by the memory controller 10 until the user data is stored in the non-volatile memory 20. In addition, the data buffer 12 temporarily stores the user data read from the non-volatile memory 20 until the user data is transmitted to the host 30. As the data buffer 12, for example, a general-purpose memory such as a static random access memory (SRAM) or a dynamic random access memory (DRAM) can be used. Note that the data buffer 12 may be mounted outside the memory controller 10 instead of being built in the memory controller 10.
The user data transmitted from the host 30 is transferred to the internal bus 16 and temporarily stored in the data buffer 12. The encoding/decoding unit 14 encodes the user data to be stored in the non-volatile memory 20 and generates a code word. In addition, the encoding/decoding unit 14 decodes the received word read from the non-volatile memory 20 and restores the user data. Therefore, the encoding/decoding unit 14 includes an encoder 17 and a decoder 18. Note that the data encoded by the encoding/decoding unit 14 may include control data or the like used inside the memory controller 10 in addition to the user data.
Next, the write processing of the present embodiment will be described. The control unit 11 instructs the encoder 17 to encode the user data during writing to the non-volatile memory 20. At that time, the control unit 11 determines a storage location (storage address) of the code word in the non-volatile memory 20, and also instructs the memory I/F 13 on the determined storage location.
The encoder 17 encodes the user data on the data buffer 12 based on the instruction from the control unit 11 and generate a code word. As the encoding method, for example, an encoding method using an algebraic code such as a Bose-Chaudhuri-Hocquenghem (BCH) code and a Reed-Solomon (RS) code, and an encoding method (product code or the like) using these codes as component codes in the row direction and the column direction can be employed. The memory I/F 13 performs control to store the code word in the storage location on the non-volatile memory 20 instructed from the control unit 11. Hereinafter, a case of using a BCH code for correcting an error of t bits (t is an integer of 2 or more) or less will be described as an example.
Next, processing during reading from the non-volatile memory 20 of the present embodiment will be described. The control unit 11 designates an address on the non-volatile memory 20 and instructs the memory I/F 13 to perform reading during reading from the non-volatile memory 20. In addition, the control unit 11 instructs the decoder 18 to start decoding. The memory I/F 13 reads a received word from a designated address of the non-volatile memory 20 in accordance with the instruction of the control unit 11, and inputs the read received word to the decoder 18. The decoder 18 decodes the received word read from the non-volatile memory 20.
The decoder 18 decodes the received word read from the non-volatile memory 20. The decoder 18 calculates an error locator polynomial using, for example, the Peterson Gorenstein Zierler (PGZ) method. The PGZ method is a method of solving a simultaneous equation established between a coefficient σ of an error locator polynomial and a syndrome using matrix calculation.
The syndrome calculation unit 101 calculates a syndrome using a received word (read sequence) read from the non-volatile memory 20. The syndrome calculation unit 101 may calculate the syndrome based on any conventionally used method. When the values of all the syndromes are 0, it can be determined that the received word has no errors, and thus the decoder 18 can end the decoding processing without performing the subsequent processing.
The error locator polynomial calculation unit 102 calculates the error locator polynomial based on the PGZ method using the syndrome. Some of the coefficients of the error locator polynomials are calculated by adding and multiplying the syndromes.
The first-degree to fourth-degree polynomials are formulas for calculating error positions when the number of errors is 1 to 4, respectively. In the example of
Therefore, in the present embodiment, an arithmetic circuitry optimized to be able to commonly calculate at least a part of multiplication of syndromes used in the calculation of the coefficients of the error locator polynomial (arithmetic unit 110 in
For example, in
The arithmetic unit 110 is configured to efficiently calculate a plurality of multiplications including a common multiplication as described above. Hereinafter, a case where the arithmetic unit 110 is configured to calculate a set of the multiplications 301 to 304 including the multiplication of the syndromes S1 and S3 will be described as an example. The arithmetic unit 110 may be configured to calculate another set of multiplications, or may be configured to calculate two or more sets of multiplications.
Returning to
The AND calculation unit 111 performs an AND operation for performing the multiplication of the elements of the Galois field. The XOR calculation unit 112 performs an XOR operation for performing the multiplication of the elements of the Galois field. Details of the AND calculation unit 111 and the XOR calculation unit 112 will be described later.
Note that the arithmetic unit 110 is a configuration unit that performs at least part of the multiplication of syndromes required for the calculation of the coefficients of the error locator polynomial. The multiplication of syndromes not performed by the arithmetic unit 110 is calculated, for example, by the error locator polynomial calculation unit 102. In this case, the error locator polynomial calculation unit 102 may calculate the coefficient (including the multiplication of the syndrome) based on any conventionally used method.
The error position calculation unit 103 calculates the error position using the error locator polynomial calculated by the error locator polynomial calculation unit 102. The processing for calculating the error position (search processing) may be implemented by any method, and for example, Chien search can be used. The Chien search is a method of sequentially substituting a value into an error locator polynomial and searching for the error position based on a value at which the output value of the error locator polynomial becomes 0.
The bit flipping unit 104 performs error correction by inverting (bit flipping) the bit at the error position calculated by the search processing.
Next, details of the arithmetic unit 110 (AND calculation unit 111 and XOR calculation unit 112) will be described. The arithmetic unit 110 performs, for example, the following procedure.
(P1) Determine m defining the number of elements 2m of the Galois field and a primitive polynomial p (x) of m-th degree.
(P2) Obtain a companion matrix corresponding to the primitive polynomial p (x).
(P3) Obtain a plurality of tensors used in the multiplication of the elements of the Galois field using the companion matrix. When an element of the Galois field is expressed by an m-dimensional vector, an element which is an output obtained by multiplying two elements is also expressed by the m-dimensional vector. The tensor is obtained for each component of the element of the output expressed by the m-dimensional vector. That is, a total of m tensors, each of which is a function whose input is two vectors and whose output is one value (one component of the vector), are obtained. Hereinafter, a tensor defined for the i-th component (i is an integer satisfying 0≤i≤m−1) of the m-dimensional vector is represented as Ti. The tensor Ti may be referred to as a second-rank tensor because the number of input vectors is two. In addition, the entire m second-rank tensors are characterized by three subscripts, and thus may be referred to as a third-rank tensor.
(P4) A second-rank tensor S representing the square of the elements of the Galois field is obtained.
(P5) For a set of multiplications to be calculated by the arithmetic unit 110 (for example, the multiplications 301 to 304), each multiplication is rewritten by an AND operation of components of an m-dimensional vector, which are two commonly included arguments, and a tensor described by the tensor Ti and the tensor S. The number of rewritten tensors obtained is the number of multiplications included in the set of multiplications. In the set of the multiplications 301 to 304, four tensors are obtained since four types of multiplications (S12S3, S1S3, S14S3, and S1S32) are included.
(P6) A tensor combining the plurality of obtained tensors (hereinafter referred to as a connected tensor) is calculated.
(P7) Among a plurality of XOR operations represented by the connected tensor, an XOR operation that can be shared is shared (optimized).
(P8) The arithmetic unit 110 is configured to perform an XOR operation according to the connected tensor after sharing.
Each of the above procedures will be further described below.
Regarding (P1), the definition of the Galois field will be described. The Galois field is determined by m ϵ {1, 2, 3 . . . } defining the number of elements and a primitive polynomial p (x) of m-th degree. The Galois field has the following characteristics.
In the BCH code having a code length n=2m−1 (m is an integer of 2 or more), for example, a Galois field GF (2m) having 2m elements is used. The element of the Galois field GF (2m) can be represented by an m-bit vector (m-dimensional vector).
For example, any element a ϵ GF (2m) of the Galois field GF (2m) can be expressed by a polynomial on GF (2) of (m−1) -th degree with respect to the primitive element α as in the following formula (2). Note that i is an integer satisfying 0≤i≤m−1.
Therefore, the element a can be expressed by an m-dimensional vector on GF (2) having a coefficient of a polynomial on GF (2) as a component, as illustrated in the following formula (3).
For example, the element of the Galois field GF (b 210) used in the BCH code having a code length of n=210−1 can be represented by a 10-bit vector. In addition, the element of the Galois field GF (24) used with the BCH code having a code length of n=24−1 can be represented by a 4-bit vector.
For example, the element 0 is represented by a polynomial 0α0+0α1+0α2+0α3, and is represented by a four-dimensional vector (0, 0, 0, 0) having the coefficients of the polynomial as components. Since the primitive element a satisfies the primitive polynomial p (α)=α4+α+1=0, in other words, satisfies α4=α+1, the element α4 can be transformed into 1+α. Therefore, for example, the element α4 is represented by a polynomial 1α0+1α1+0α2+0α3, and is represented by a four-dimensional vector (1, 1, 0, 0) having the coefficients of the polynomial as components.
In addition, the addition of the elements of the Galois field is represented by the XOR of two vectors for each bit. In the present embodiment, the multiplication of the element of the Galois field is performed by an arithmetic circuitry combining an AND operation and an XOR operation.
In (P1), first, m that defines the number of elements of the Galois field to be calculated is determined. The value of m may be determined in any manner, and may be determined according to, for example, the encoding system applied and the type of memory applied. For example, in a case where the memory system 1 uses a BCH code having a code length of 1000 bits, m is determined to be 10 to include the number of elements (210=1024>1000) larger than the code length. In the Galois field, one or more primitive polynomials are defined for each value of m. Among these primitive polynomials, one primitive polynomial to be used is determined.
Next, (P2) will be described. After the primitive polynomial p (x) is determined, a matrix called a companion matrix is determined.
The multiplication of the element a and the primitive element α of the Galois field can be expressed by the multiplication of the companion matrix C and the vector representing the element a. The multiplication with the primitive element α can be divided into a calculation corresponding to right shift and a calculation corresponding to feedback (FB). The circuit in
For example, the multiplication of a in the Galois field GF (24) in which the primitive polynomial is represented by p (x)=x4+x+1 can be represented by the multiplication of the companion matrix C and the element a as illustrated in the formula in the lower part of
Next, (P3) will be described. In (P3), m tensors Ti used in the multiplication of the elements of the Galois field are obtained using the companion matrix. If m=10, 10 tensors from T0 to T9 are obtained.
Next, (P4) will be described. The tensor S representing the square of the element of the Galois field can be obtained from the tensor Ti.
The reason why only the diagonal components are extracted is that the components symmetrical with respect to the main diagonal are canceled and become 0. For example, (a×a)0 corresponding to the tensor T0 is a0a0+a2a2+a3a1+a1a3. In GF (2), the square a2 of the element a is equal to a, and the addition (a+a) of the same element a is 0. For example, a0a0 and a2a2 are a0 and a2, respectively. In addition, a3a1+a1a3 is the addition of the same element and thus 0. Therefore, a0a0+a2a2+a3a1+a1a3 is expressed as a0+a2.
Next, (P5) will be described. Hereinafter, an example of rewriting the multiplications S1S3, S12S3, S14S3, and S1S32 included in the set of the multiplications 301 to 304 will be described.
As illustrated in the middle of
Therefore, in the present embodiment, the format illustrated in the center of
As described above, the tensor used in each multiplication is represented by the following formula (5) including the tensor Ti, the linear operation Su representing the 2u-th power, and the linear operation Sv representing the 2v-th power.
In the example of
Since S1S3: u=0 and v=0, (S0)T×Ti×S0=Ti.
Since S12S3: u=1 and v=0, (S0)T×Ti×S1=TiS.
Since S14S3: u=2 and v=0, (S0)T×Ti×S2=TiS2.
Since S1S32: u=0 and v=1, (S1)T×Ti×S0=STTi.
In the rewritten format, since the argument is one type, S1 and S3, only one type of AND operation of (S1)j(S3)k is required. Therefore, the scale of the circuit for the AND operation can be reduced.
Next, (P6) will be described. The plurality of tensors determined for each multiplication are combined into one connected tensor.
For example, the connected tensor is obtained by converting each of the plurality of tensors into a one-dimensional vector and using the converted vector as a vector of each row. In
As described above, the tensor Ti is obtained for each component of the m-dimensional vector representing the element of the Galois field. Therefore, in the example of
In the case of m=10, the connected tensor is a tensor obtained by connecting 40 (4 types×10) tensors.
Next, (P7) will be described. (P7) is a procedure for sharing an operation that can be shared among a plurality of XOR operations defined by the connected tensor. As a result, the scale of the circuit for the XOR operation can be reduced.
A shared variable c=a3+a6+a7+a8 as including a term common to these two components is defined. When the shared variable is used, the 0-th component (α452a)0 is represented as a2+a5+c, and the first component (α452a)1 is represented as a0+a4+a9+c. As a result, the number of required XOR operations is reduced to eight.
The sharing of XOR operations may be obtained by any method, and is obtained by, for example, the following method.
In the tensor in
A connected tensor before the sharing is referred to as a pre-deformation tensor. The pre-deformation tensor is generated as a tensor in which a vector obtained by one-dimensionalizing a j-th (j is 0≤j≤J−1, and J is the total number of the plurality of tensors) tensor among the plurality of tensors obtained in the procedure (P6) is set as a j-th row vector.
The tensor (connected tensor) after the sharing is generated by, for example, the following procedure.
In the present embodiment, the XOR operation is shared based on the connected tensor obtained by connecting the plurality of tensors. Since the connected tensor includes more rows, the possibility of finding the XOR operation that can be shared can be increased. Therefore, the scale of the circuit for the XOR operation can be more efficiently reduced. Note that the pre-deformation tensor may be used as the connected tensor without performing the sharing of (P7).
Next, (P8) will be described. In (P8), the arithmetic unit 110 is configured (designed) to perform the XOR operation according to the connected tensor after the sharing. The configuration of the arithmetic unit 110 may be performed by any conventionally used method. For example, a method using a tool or the like that performs circuit design in a hardware description language at the register transfer level (RTL) can be applied. When the tool has an optimization function for sharing XOR operations, the procedures (P7) and (P8) may be performed together using such a tool.
The arithmetic unit 110 is configured by the above procedure.
The AND calculation unit 111 calculates the AND value that is the result of the AND operation of the elements a and b of the Galois field. The AND value includes m×m components that are the results of the AND operation of the m components of the element a and the m components of the element b (hereinafter, arithmetic components). In the example of
For each of a plurality of mutually different sets of (u, v), the XOR calculation unit 112 calculates a {circumflex over ( )} (2u)×b{circumflex over ( )} (2v), which is a product of the 2u-th power of the element a and the 2v-th power of the element b, from the XOR operation based on the AND value and the connected tensor obtained by collecting the plurality of tensors different for each set.
Each of the plurality of tensors can be interpreted as including m×m components (tensor components) indicating whether or not each arithmetic component is used in the XOR operation. For example, the tensor component 1 represents that the corresponding arithmetic component is used in the XOR operation, and the tensor component 0 represents that the corresponding arithmetic component is not used in the XOR operation. The XOR calculation unit 112 performs the XOR operation using the arithmetic component indicated to be used in the XOR operation by the tensor component, that is, the arithmetic component corresponding to the tensor component 1, and calculates a {circumflex over ( )} (2u)×b {circumflex over ( )} (2v).
In the example of
Therefore, in the example of
In the lower part of
The four multipliers perform the AND operation 100 times and the XOR operation 99 times. S and S2 correspond to circuits for calculating the square and the fourth power for calculating arguments. The circuit (S) for calculating the square requires, for example, six XOR operations. The circuit (S2) for calculating the fourth power requires, for example, 11 XOR operations. When these are summed up, 400 AND operations and 419 XOR operations are required in the comparative example.
When the circuit scale is compared, for example, it is desirable to consider the number of transistors required for each operation (AND operation and XOR operation). If the ratio of the circuit scale between the AND operation and the XOR operation is set to, for example, 6:10 in consideration of the ratio of the number of transistors, the AND calculation unit 111 in
The arithmetic unit 110 in
Since the multiplication uses the same element c (=S1) as the argument, the calculation of the set where j=k is not required, and the input S1 may be output as it is. In addition, since the AND operation where j and k are interchanged leads to the same value, only one of the calculations may be performed. Therefore, the AND calculation unit 111 illustrated in
For each of a plurality of mutually different sets of (u, v), the XOR calculation unit 112 calculates c {circumflex over ( )} (2u)×c {circumflex over ( )} (2v), which is a product of the 2u-th power of the element c and the 2v-th power of the element c, from the XOR operation based on the AND value and the connected tensor obtained by collecting the plurality of tensors different for each set.
In the example of
In the example of
Next, the flow of decoding processing by the memory system 1 will be described.
The control unit 11 reads an error correction code from the non-volatile memory 20, and obtains the received word (step S101). In addition, the control unit 11 instructs the decoder 18 to start decoding.
The syndrome calculation unit 101 of the decoder 18 calculates the syndrome from the received word (step S102) . The decoder 18 determines whether or not all the calculated values of the syndromes are 0 (step S103).
When all the syndromes are 0 (step S103: Yes), since it can be determined that there is no error in the received word, the decoder 18 ends the decoding processing. When all the syndromes are not 0 (step S103: No) , the error locator polynomial calculation unit 102 calculates the error locator polynomial using the syndromes according to the PGZ method (step S104). At this time, the multiplication of the syndrome is calculated by the arithmetic unit 110 for at least some coefficients of the error locator polynomial.
The error position calculation unit 103 searches for the error position based on the calculated error locator polynomial (step S105). The bit flipping unit 104 corrects the error by inverting (bit flipping) the bit at the error position obtained by the searching (step S106), and ends the decoding processing.
As described above, according to the present embodiment, it is possible to suppress an increase in the circuit scale for performing the multiplication of the Galois field.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2022-193213 | Dec 2022 | JP | national |