Systems and Methods for Area Efficient Data Encoding

Information

  • Patent Application
  • 20150229331
  • Publication Number
    20150229331
  • Date Filed
    August 27, 2014
    10 years ago
  • Date Published
    August 13, 2015
    9 years ago
Abstract
The present inventions are related to systems and methods for data processing, and more particularly to systems and methods for data encoding.
Description
CROSS REFERENCE TO RELATED APPLICATIONS

The present application claims priority to Russian Patent App. No. 2014104571 entitled “Systems and Methods for Area Efficient Data Encoding”, and filed Feb. 10, 2014 by Panteleev et al. The entirety of the aforementioned patent application is incorporated herein by reference for all purposes.


FIELD OF THE INVENTION

The present inventions are related to systems and methods for data processing, and more particularly to systems and methods for data encoding.


BACKGROUND


Various data transfer systems have been developed including storage systems, cellular telephone systems, and radio transmission systems. In each of the systems data is transferred from a sender to a receiver via some medium. For example, in a storage system, data is sent from a sender (i.e., a write function) to a receiver (i.e., a read function) via a storage medium. Encoding may involve vector multiplication by a quasi-cyclic matrices. Such vector multiplication is complex both in terms of circuit design and the area required to implement the circuits. Such significant area requirements increase the costs of encoding devices.


Hence, for at least the aforementioned reasons, there exists a need in the art for advanced systems and methods for data processing.


SUMMARY

The present inventions are related to systems and methods for data processing, and more particularly to systems and methods for data encoding.


Various embodiments of the present invention provide data processing systems that include an encoder circuit. The encoder circuit includes a cyclic convolution circuit and an encoded output circuit. The cyclic convolution circuit is operable to multiply a vector input derived from a user data input by a portion of a circulant matrix to yield a convolved output. The encoded output circuit is operable to generate an encoded data set corresponding to the user data input and based at least in part on the convolved output.


This summary provides only a general outline of some embodiments of the invention. The phrases “in one embodiment,” “according to one embodiment,” “in various embodiments”, “in one or more embodiments”, “in particular embodiments” and the like generally mean the particular feature, structure, or characteristic following the phrase is included in at least one embodiment of the present invention, and may be included in more than one embodiment of the present invention. Importantly, such phases do not necessarily refer to the same embodiment. Many other embodiments of the invention will become more fully apparent from the following detailed description, the appended claims and the accompanying drawings.





BRIEF DESCRIPTION OF THE FIGURES

A further understanding of the various embodiments of the present invention may be realized by reference to the figures which are described in remaining portions of the specification. In the figures, like reference numerals are used throughout several figures to refer to similar components. In some instances, a sub-label consisting of a lower case letter is associated with a reference numeral to denote one of multiple similar components. When reference is made to a reference numeral without specification to an existing sub-label, it is intended to refer to all such multiple similar components.



FIG. 1 shows a storage system having area efficient LDPC encoder circuitry in accordance with various embodiments of the present invention;



FIG. 2 shows a data transmission device including a transmitter having area efficient LDPC encoder circuitry in accordance with various embodiments of the present invention;



FIG. 3 shows a solid state memory circuit including a data processing circuit having area efficient LDPC encoder circuitry in accordance with some embodiments of the present invention;



FIG. 4
a shows a processing system including an area efficient LDPC encoder circuit in accordance with some embodiments of the present invention;



FIG. 4
b shows one implementation of an area efficient quasi-cyclic matrix multiplication circuit relying on a number of cyclic convolutions that may be used to implement the area efficient encoder circuit of FIG. 4a;



FIG. 4
c depicts an cyclic convolution circuit that may be used to implement the area efficient quasi-cyclic matrix multiplication circuit of FIG. 4b;



FIG. 5
a shows another implementation an area efficient quasi-cyclic matrix multiplication circuit relying on a number of cyclic convolutions that may be used to implement the area efficient encoder circuit of FIG. 4a; and



FIG. 5
b depicts one implementation of a parallel cyclic convolution circuit that may be used to implement the parallel cyclic convolution circuit of FIG. 5a.





DETAILED DESCRIPTION OF SOME EMBODIMENTS

The present inventions are related to systems and methods for data processing, and more particularly to systems and methods for data encoding.


Various embodiments of the present invention provide data processing systems that include an encoder circuit. The encoder circuit includes one or more area efficient quasi-cyclic matrix multiplication circuit(s). Such quasi-cyclic matrix multiplication circuit(s) are designed as a number of cyclic convolutions. Using such an approach, it is possible to implement a encoder circuit for quasi-cyclic low density parity check (LDPC) codes that is smaller and offering several times higher throughput compared with an encoder circuit relying exclusively on shift registers and/or barrel shifters to perform quasi-cyclic matrix multiplications. In some cases, the quasi-cyclic matrix multiplication circuit(s) designed as a number of cyclic convolutions may use a combination of Winograd and Agarwal-Cooley fast convolution algorithms, though many other fast convolution algorithms can be used as well. Such Winograd and Agarwal-Cooley algorithms are discussed in detail in Richard E. Blahut, “Fast Algorithms for Digital Signal Processing,” Addison-Wesley, Reading, MA 1985. The entirety of the aforementioned reference is incorporated herein by reference for all purposes.


Most encoding algorithms for quasi-cyclic LDPC codes can be roughly divided into two main categories: generator matrix based (G-based) and parity-check matrix based (H-based). In a G-based encoder a systematic quasi-cyclic generator matrix G=(I|Gp) is used, where Gp is a quasi-cyclic matrix, which is usually dense. The parity bits vector p is obtained by formula p=uGp, where u is a user bits vector. In an H-based encoder we usually represent a quasi-cyclic parity-check matrix of the code as H=(Hu|Hp), where Hu, Hp are its quasi-cyclic sub-matrices corresponding to the user and parity parts of the codeword. Subsequently, the vector sT=HuuT is calculated, and based thereon the parity vector p is determined as a solution of the equation HppT=sT. As it can be seen from the above description both categories of encoders involve a vector by a quasi-cyclic matrix multiplication step. As such, embodiments of the present invention offering improved quasi-cyclic multiplication circuits offer improved encoding.


Various embodiments of the present invention provide data processing systems that include an encoder circuit. The encoder circuit includes a cyclic convolution circuit and an encoded output circuit. The cyclic convolution circuit is operable to multiply a vector input derived from a user data input by a portion of a circulant matrix to yield a convolved output. The encoded output circuit is operable to generate an encoded data set corresponding to the user data input and based at least in part on the convolved output. In some cases, the data processing system is implemented as part of a storage device, or a communication device. In various cases, the data processing system is implemented as part of an integrated circuit.


In some instances of the aforementioned embodiments, the encoded output circuit includes: a vector adder circuit operable to sum instances of the convolved output with instances of a cyclic convolution output to yield a corresponding instance of a vector sum, and a shift register circuit operable to shift instances of the vector sum to yield the instances of the cyclic convolution output. In some cases, the encoded data set generated based at least in part on the cyclic convolution output. In various cases, the number of instances of the vector sum is l, where l corresponds to the number of sub-vectors into which the user data input is divided.


In various instances of the aforementioned embodiments, the cyclic convolution circuit includes: a first cyclic convolution circuit and a second cyclic convolution circuit. In such instances, the first cyclic convolution circuit operates in parallel with the second cyclic convolution circuit, and the first cyclic convolution circuit operates on a first portion of the vector input and the second cyclic convolution circuit operates on a second portion of the vector input. In some cases, the first portion of the vector input is a 3'1 portion of the vector input, and wherein the second portion of the vector input is a 3×4 portion of the vector input. In other cases, the first portion of the vector input is a 3×4 portion of the vector input, and wherein the second portion of the vector input is a 3×8 portion of the vector input.


In one or more instances of the aforementioned embodiments, the systems further include a transformation circuit operable to transform a first number of bits of the user data input into a second number of bits of the vector input. In some such instances, the first number of bits is 128, and the second number of bits is 255. In various such instances, the cyclic convolution circuit includes: a first cyclic convolution circuit, a second cyclic convolution circuit, and a combining circuit. In such instances, the first cyclic convolution circuit operates in parallel with the second cyclic convolution circuit, and the first cyclic convolution circuit operates on a first portion of the vector input and the second cyclic convolution circuit operates on a second portion of the vector input. The combining circuit is operable to combine at least the first sub-output and the second sub-output to yield a non-transformed output. In some cases, the system further includes an inverse transformation circuit operable transform the second number of bits of the non-transformed output to the first number of bits of a cyclic convolution output.


Other embodiments of the present invention provide methods for data encoding that include: receiving a user data input; using a cyclic convolution circuit to multiply a vector input derived from a user data input by a portion of a circulant matrix to yield a convolved output; and generating an encoded data set corresponding to the user data input and based at least in part on the convolved output. In some instances of the aforementioned embodiments, the methods further include transforming a first number of bits of the user data input into a second number of bits to yield the vector input. In some cases, the first number of bits is 128, and the second number of bits is 255.


In one or more instances of the aforementioned embodiments, the cyclic convolution circuit includes: a first cyclic convolution circuit and a second cyclic convolution circuit. The first cyclic convolution circuit operates in parallel with the second cyclic convolution circuit. The first cyclic convolution circuit operates on a first portion of the vector input and the second cyclic convolution circuit operates on a second portion of the vector input. In some cases, the methods further include: adding instances of the convolved output with instances of a cyclic convolution output to yield a corresponding instance of a vector sum; and shifting instances of the vector sum to yield the instances of the cyclic convolution output.


Turning to FIG. 1, a storage system 100 is shown that includes a read channel 110 having area efficient LDPC encoder circuitry in accordance with one or more embodiments of the present invention. Storage system 100 may be, for example, a hard disk drive. Storage system 100 also includes a preamplifier 170, an interface controller 120, a hard disk controller 166, a motor controller 168, a spindle motor 172, a disk platter 178, and a read/write head 176. Interface controller 120 controls addressing and timing of data to/from disk platter 178, and interacts with a host controller (not shown). The data on disk platter 178 consists of groups of magnetic signals that may be detected by read/write head assembly 176 when the assembly is properly positioned over disk platter 178. In one embodiment, disk platter 178 includes magnetic signals recorded in accordance with either a longitudinal or a perpendicular recording scheme.


In a typical read operation, read/write head 176 is accurately positioned by motor controller 168 over a desired data track on disk platter 178. Motor controller 168 both positions read/write head 176 in relation to disk platter 178 and drives spindle motor 172 by moving read/write head assembly 176 to the proper data track on disk platter 178 under the direction of hard disk controller 166. Spindle motor 172 spins disk platter 178 at a determined spin rate (RPMs). Once read/write head 176 is positioned adjacent the proper data track, magnetic signals representing data on disk platter 178 are sensed by read/write head 176 as disk platter 178 is rotated by spindle motor 172. The sensed magnetic signals are provided as a continuous, minute analog signal representative of the magnetic data on disk platter 178. This minute analog signal is transferred from read/write head 176 to read channel circuit 110 via preamplifier 170. Preamplifier 170 is operable to amplify the minute analog signals accessed from disk platter 178. In turn, read channel circuit 110 decodes and digitizes the received analog signal to recreate the information originally written to disk platter 178. This data is provided as read data 103 to a receiving circuit. A write operation is substantially the opposite of the preceding read operation with write data 101 being provided to read channel circuit 110. This data is then encoded and written to disk platter 178.


In operation, data stored to disk platter 178 is encoded using an area efficient encoder circuit to yield an encoded data set. The encoded data set is then written to disk platter 178, and later accessed from disk platter and decoded using a decoder circuit. In some cases, the area efficient encoder circuit may be implemented to include quasi-cyclic matrix multiplication circuit(s) designed as a number of cyclic convolutions such as that discussed below in relation to FIGS. 4b-4c. In particular cases, the area efficient encoder circuit may be implemented to include quasi-cyclic matrix multiplication circuit(s) that are designed to use a combination of Winograd and Agarwal-Cooley fast convolution algorithms such as one described below in relation to FIGS. 5a-5b. The area efficient encoder circuit may operate similar to that discussed below in relation to FIG. 6.


It should be noted that storage system 100 may be integrated into a larger storage system such as, for example, a RAID (redundant array of inexpensive disks or redundant array of independent disks) based storage system. Such a RAID storage system increases stability and reliability through redundancy, combining multiple disks as a logical unit. Data may be spread across a number of disks included in the RAID storage system according to a variety of algorithms and accessed by an operating system as if it were a single disk. For example, data may be mirrored to multiple disks in the RAID storage system, or may be sliced and distributed across multiple disks in a number of techniques. If a small number of disks in the RAID storage system fail or become unavailable, error correction techniques may be used to recreate the missing data based on the remaining portions of the data from the other disks in the RAID storage system. The disks in the RAID storage system may be, but are not limited to, individual storage systems such as storage system 100, and may be located in close proximity to each other or distributed more widely for increased security. In a write operation, write data is provided to a controller, which stores the write data across the disks, for example by mirroring or by striping the write data. In a read operation, the controller retrieves the data from the disks. The controller then yields the resulting read data as if the RAID storage system were a single disk.


A data decoder circuit used in relation to read channel circuit 110 may be, but is not limited to, a low density parity check (LDPC) decoder circuit as are known in the art. Such low density parity check technology is applicable to transmission of information over virtually any channel or storage of information on virtually any media. Transmission applications include, but are not limited to, optical fiber, radio frequency channels, wired or wireless local area networks, digital subscriber line technologies, wireless cellular, Ethernet over any medium such as copper or optical fiber, cable channels such as cable television, and Earth-satellite communications. Storage applications include, but are not limited to, hard disk drives, compact disks, digital video disks, magnetic tapes and memory devices such as DRAM, NAND flash, NOR flash, other non-volatile memories and solid state drives.


In addition, it should be noted that storage system 100 may be modified to include solid state memory that is used to store data in addition to the storage offered by disk platter 178. This solid state memory may be used in parallel to disk platter 178 to provide additional storage. In such a case, the solid state memory receives and provides information directly to read channel circuit 110. Alternatively, the solid state memory may be used as a cache where it offers faster access time than that offered by disk platted 178. In such a case, the solid state memory may be disposed between interface controller 120 and read channel circuit 110 where it operates as a pass through to disk platter 178 when requested data is not available in the solid state memory or when the solid state memory does not have sufficient storage to hold a newly written data set. Based upon the disclosure provided herein, one of ordinary skill in the art will recognize a variety of storage systems including both disk platter 178 and a solid state memory.


Turning to FIG. 2, a data transmission system 200 including a transmitter 210 having area efficient LDPC encoder circuitry in accordance with one or more embodiments of the present invention. Transmitter 210 transmits encoded data via a transfer medium 230. Transfer medium 230 may be a wired or wireless transfer medium. Based upon the disclosure provided herein, one of ordinary skill in the art will recognize a variety of transfer mediums that may be used in relation to different embodiments of the present invention. The encoded data is received from transfer medium 230 by receiver 220. In operation, transmitter encodes user data using an area efficient encoder circuit to yield an encoded data set. In some cases, the area efficient encoder circuit may be implemented to include quasi-cyclic matrix multiplication circuit(s) designed as a number of cyclic convolutions such as that discussed below in relation to FIGS. 4b-4c. In particular cases, the area efficient encoder circuit may be implemented to include quasi-cyclic matrix multiplication circuit(s) that are designed to use a combination of Winograd and Agarwal-Cooley fast convolution algorithms such as one described below in relation to FIGS. 5a-5b. The area efficient encoder circuit may operate similar to that discussed below in relation to FIG. 6.


Turning to FIG. 3, another storage system 300 is shown that includes a data processing circuit 310 having area efficient LDPC encoder circuitry in accordance with one or more embodiments of the present invention. A host controller circuit 305 receives data to be stored (i.e., write data 301). Solid state memory access controller circuit 340 may be any circuit known in the art that is capable of controlling access to and from a solid state memory 350. Solid state memory access controller circuit 340 encodes a received data set to yield an encoded data set. The encoding is done using an area efficient LDPC encoder circuit, and results in an encoded data set that is stored to solid state memory 350. Solid state memory 350 may be any solid state memory known in the art. In some embodiments of the present invention, solid state memory 350 is a flash memory. In some cases, the area efficient encoder circuit may be implemented to include quasi-cyclic matrix multiplication circuit(s) designed as a number of cyclic convolutions such as that discussed below in relation to FIGS. 4b-4c. In particular cases, the area efficient encoder circuit may be implemented to include quasi-cyclic matrix multiplication circuit(s) that are designed to use a combination of Winograd and Agarwal-Cooley fast convolution algorithms such as one described below in relation to FIGS. 5a-5b. The area efficient encoder circuit may operate similar to that discussed below in relation to FIG. 6.


Turning to FIG. 4a, a data processing system 400 is shown that includes an area efficient LDPC encoder circuit 420 in accordance with some embodiments of the present invention. Data processing system 400 includes area efficient LDPC encoder circuit 420 that applies data encoding algorithm using matrix multiplication implemented as a number of cyclic convolutions. Area efficient LDPC encoder circuit 420 applies the encoding algorithm to an original data input 405 to yield an encoded output 439. Application of the encoding algorithm includes performing a number of vector multiplications by quasi-cyclic matrices implemented as a number of cyclic convolutions. The vector multiplications by quasi-cyclic matrices may be implemented similar to that discussed below in relation to FIGS. 4b-4c.


Encoded output 439 is provided to a transmission circuit 430 that is operable to transmit the encoded data to a recipient via a medium 440. Transmission circuit 430 may be any circuit known in the art that is capable of transferring encoded output 439 via medium 440. Thus, for example, where data processing circuit 400 is part of a hard disk drive, transmission circuit 430 may include a read/write head assembly that converts an electrical signal into a series of magnetic signals appropriate for writing to a storage medium. Alternatively, where data processing circuit 400 is part of a wireless communication system, transmission circuit 430 may include a wireless transmitter that converts an electrical signal into a radio frequency signal appropriate for transmission via a wireless transmission medium. Transmission circuit 430 provides a transmission output to medium 440. Medium 440 provides a transmitted input that is the transmission output augmented with one or more errors introduced by the transference across medium 440.


Of note, original data input 405 may be any data set that is to be transmitted. For example, where data processing system 400 is a hard disk drive, original data input 405 may be a data set that is destined for storage on a storage medium. In such cases, a medium 440 of data processing system 400 is a storage medium. As another example, where data processing system 400 is a communication system, original data input 405 may be a data set that is destined to be transferred to a receiver via a transfer medium. Such transfer mediums may be, but are not limited to, wired or wireless transfer mediums. In such cases, a medium 440 of data processing system 400 is a transfer medium.


Data processing circuit 400 includes an analog processing circuit 450 that applies one or more analog functions to the transmitted input. Such analog functions may include, but are not limited to, amplification and filtering. Based upon the disclosure provided herein, one of ordinary skill in the art will recognize a variety of pre-processing circuitry that may be used in relation to different embodiments of the present invention. In addition, analog processing circuit 450 converts the processed signal into a series of corresponding digital samples. Data processing circuitry 460 applies data detection and/or data decoding algorithms to the series of digital samples to yield a data output 465. Based upon the disclosure provided herein, one of ordinary skill in the art will recognize a variety of data processing circuitry that may be used to recover original data input from the series of digital samples.


As background to understanding an area efficient quasi-cyclic matrix multiplication circuit used to implement the area efficient encoder circuit 420, an l×l matrix over GF(q) is called a circulant if it has the following form:







(




a
0




a

l
-
1








a
1






a
1




a
0







a
2




















a

l
-
1





a

l
-
2








a
0




)

.




Such a circulant matrix can be uniquely represented by its first column (a0, a1, . . . , al−1)T, and it can be seen that a vector can be re-written by a circulant matrix multiplication in the following way:







[




c
0






c
1











c

l
-
1





]

=



(




a
0




a

l
-
1








a
1






a
1




a
0







a
2




















a

l
-
1





a

l
-
2








a
0




)



[




b
0






b
1











b

l
-
1





]


.





The aforementioned multiplication may be represented in the following way:







c
i

=




j
=
0


l
-
1









a
j




b


(

i
-
j

)


mod





n


.







The vector C=(c0, . . . , cl−1)T is referred to herein as a cyclic convolution of the vectors a=(a0, . . . , al−1)T and b=b0, . . . , bl−1)T, and for simplicity is denoted as a*b.


A quasi-circulant matrix may be represented as follows:







A
=

(




A
11







A

n





1


















A

m





1








A
mn




)


,




where each block Aij, i=1 to m, j=1 to n, is an l×1 circulant matrix over a finite field GF(q). Using a column vector u=(u1, . . . , un)T, where sub-vectors u1, . . . , un are of length l, multiplying u by the aforementioned quasi-circulant matrix yields:








[




v
1











v
m




]

=


(




A
11







A

n





1


















A

m





1








A
mn




)



[




u
1











u
n




]



,




where each sub-vector vi of length l is given by the following formula:






v
i
=A
i1
u
1
+ . . . +A
in
u
n; for i=1 to m.


Applying cyclic convolution, the preceding formula for each sub-vector vi of length l may be re-written as:






v
i
=a
i1
*u
1
+ . . . +a
in
*u
n; for i=1 to m.


where aij is the first column of the aforementioned circulant matrix Aij; for i=1 to m, and j=1 to n. Thus, quasi-cyclic multiplication can be obtained by performing m×n cyclic convolutions and m×(n−1) vector additions over GF(q).


Turning to FIG. 4b, an implementation of an area efficient quasi-cyclic matrix multiplication circuit 470 relying on a number of cyclic convolutions is shown that may be used to implement the matrix multiplication circuitry of area efficient encoder circuit 420 of FIG. 4a. Area efficient quasi-cyclic matrix multiplication circuit 470 includes a read only memory circuit 475 pre-programmed to include the first columns of circulant matrices 478 (i.e., the aforementioned Aij).


Original data input 405 (i.e., Uj) and the first columns of circulant matrices 478 (i.e., aij) are provided to a cyclic convolution circuit 485 that applies cyclic convolution to the received inputs to yield a convolved output 482 (i.e., aij*uj). Convolved output 482 is provided to a vector addition circuit 490 that is operable to calculate the sum of two vectors of length l over GF(q). In some embodiments of the present invention, vector addition circuit 490 is implemented using XOR gates as is known in the art. In particular, vector addition circuit 490 calculates the sum of convolved output 482 and an accumulated cyclic convolution output 497 over a length l. A resulting vector sum 492 is stored to a shift register circuit 495 where it is shifted over the length l with the final shift yielding the final value of cyclic convolution output 497. Initially, all of the values in shift register circuit 495 are zeros. The final value of cyclic convolution output 497 may be represented by the following equation:





cyclic convolution output 497=ai1*u1+ . . . +ain*un; for i=1 to m.


The approach used in area efficient quasi-cyclic matrix multiplication circuit 470 operates over m×n clock cycles plus the delay of cyclic convolution circuit 485. Original data input 405) and the first columns of circulant matrices 478 (aij) should be in the following order:










u
j






a
ij









u
1




u
1







u
1




u
2




u
2







u
2




u
3









a
11




a
21







a

m





1





a
12




a
22







a

m





2





a
13











Turning to FIG. 4c, one implementation of a cyclic convolution circuit 900 for a length l of three that may be used to implement area efficient quasi-cyclic matrix multiplication circuit 470 of FIG. 4b. As shown, cyclic convolution circuit 900 receives two vectors each of length three (i.e., ‘a’ and ‘b’). Vector ‘a’ includes a vector element 902 (a0), a vector element 904 (a1), and a vector element 906 (a2). Vector ‘b’ includes a vector element 908 (b0), a vector element 910 (b1), and a vector element 912 (b2). Where cyclic convolution circuit 900 is used in relation to area efficient quasi-cyclic matrix multiplication circuit 470, vector ‘a’ corresponds to original data input 405 (i.e., uj), and vector ‘b’ corresponds to the first columns of circulant matrices 478 (i.e., aij).


Vector element 902 is provided to a multiplier circuit 922 where it is multiplied by vector element 908 to yield a product 942; vector element 902 is provided to a multiplier circuit 928 where it is multiplied by vector element 910 to yield a product 948; and vector element 902 is provided to a multiplier circuit 938 where it is multiplied by vector element 912 to yield a product 958. Vector element 904 is provided to a multiplier circuit 924 where it is multiplied by vector element 912 to yield a product 944; vector element 904 is provided to a multiplier circuit 930 where it is multiplied by vector element 908 to yield a product 950; and vector element 904 is provided to a multiplier circuit 936 where it is multiplied by vector element 910 to yield a product 956. Vector element 906 is provided to a multiplier circuit 926 where it is multiplied by vector element 910 to yield a product 946; vector element 906 is provided to a multiplier circuit 932 where it is multiplied by vector element 912 to yield a product 952; and vector element 906 is provided to a multiplier circuit 934 where it is multiplied by vector element 908 to yield a product 954.


Product 942, product 944, and product 946 are provided to an adder circuit 962 where they are summed to yield a vector component 972 (c0). Product 948, product 950, and product 952 are provided to an adder circuit 964 where they are summed to yield a vector component 974 (c1). Product 954, product 956, and product 958 are provided to an adder circuit 966 where they are summed to yield a vector component 976 (c2).


Where the length/of convolved output 482 is small, implementation of area efficient quasi-cyclic matrix multiplication circuit 470 using blocks similar to that discussed in FIG. 4c may be acceptable. However, where the length/of convolved output 482 becomes larger, cyclic convolution circuit 485 may be implemented using one or more fast cyclic convolution algorithms known in the art. Turning to FIG. 5a, another implementation an area efficient quasi-cyclic matrix multiplication circuit 500 is shown that relies on a number of cyclic convolutions that may be used to implement the area efficient encoder circuit 420 of FIG. 4a. Area efficient quasi-cyclic matrix multiplication circuit 500 utilizes a parallel cyclic convolution circuit 540 implemented using a combination of Winograd and Agarwal-Cooley fast convolution algorithms to operate on a binary field GF(2).


Area efficient quasi-cyclic matrix multiplication circuit 500 includes a register circuit 510 that holds a number of bits of an original data input 505 in parallel. In one embodiment of the present invention, the number of bits is one-hundred twenty-eight (128) bits. Based upon the disclosure provided herein, one of ordinary skill in the art will recognize other bit widths that may be used in relation to different embodiments of the present invention. The registered data is accessed in parallel from register circuit 510 as a registered vector 515. Registered vector 515 is provided to a transformation circuit 520 where the number of bits in registered vector 515 are increased to yield a transformed vector 525. The operation of transformation circuit 520 is more fully discussed below. In one embodiment of the present invention, the number of bits in transformed vector 525 is two-hundred fifty-five (255) bits. Based upon the disclosure provided herein, one of ordinary skill in the art will recognize other bit widths that may be used in relation to different embodiments of the present invention. Transformed vector 525 is stored to a register circuit 530 that provides the registered data as a registered vector 535 (a′).


Similarly, area efficient quasi-cyclic matrix multiplication circuit 500 includes a register circuit 511 that holds a number of bits of an original data input 506 in parallel. In one embodiment of the present invention, the number of bits is one-hundred twenty-eight (128) bits. Based upon the disclosure provided herein, one of ordinary skill in the art will recognize other bit widths that may be used in relation to different embodiments of the present invention. The registered data is accessed in parallel from register circuit 511 as a registered vector 516. Registered vector 516 is provided to a transformation circuit 521 where the number of bits in registered vector 516 are increased to yield a transformed vector 526. The operation of transformation circuit 521 is more fully discussed below. In one embodiment of the present invention, the number of bits in transformed vector 526 is two-hundred fifty-five (255) bits. Based upon the disclosure provided herein, one of ordinary skill in the art will recognize other bit widths that may be used in relation to different embodiments of the present invention. Transformed vector 526 is stored to a register circuit 531 that provides the registered data as a registered vector 536 (b′).


Assuming the width of registered vector 535 and registered vector 536 is 255, parallel cyclic convolution circuit 540 that splits each of registered vector 535 and registered vector 536 into chunks (s0(1), . . . , s0(12), s1(1), . . . , s1(12), s2(1), . . . , s2(12)), where 1-bit chunks s0(1), s1(1), s2(1) are considered as elements of GF(2); 4-bit chunks s0(2), s1(2), s2(2) are considered as elements of GF(24); 8-bit chunks s0(3), s1(3), s2(3), . . . , s0(12), s1(12), s2(12) are considered as elements of GF(28).


The aforementioned chunks are distributed between twelve cyclic convolution blocks 550, 560, 570, 580 over the finite fields GF(2), GF(24), and GF(28) as shown on FIG. 5b. The primitive polynomial for GF(24) is x4+x+1, the primitive polynomial for GF(28) is x8+x4+x3+x+1. Turning to FIG. 5b, the ith cyclic convolution block calculates the cyclic convolution of the chunks a0(i), a1(i), a2(i) of registered vector 535 (a′) and the chunks b0(i), b1(i), b2(i) of registered vector 536 (b′). Each of the twelve cyclic convolution blocks (represented by blocks 550, 560, 570, 580) calculates cyclic convolution of length three (3) and can be implemented similar to cyclic convolution circuit 900 discussed above in relation to FIG. 4c. In particular, a 3×1 block a0(1), a1(1), a2(1) is convolved with a 3×1 block b0(1), b1(1), b2(1) by block 550 to yield a 3×1 convolved output c0(1), c1(1), c2(1).A 3×4 block a0(2), a1(2), a2(2) is convolved with a 3×4 block b0(2), b1(2), b2(2) by block 560 to yield a 3×4 convolved output c0(2), c1(2), c2(2). A 3×8 block a0(3), a1(3), a2(3) is convolved with a 3×8 block b0(3), b1(3), b2(3)by block 570 to yield a 3x8 convolved output c0(3), c1(3), c2(3). A 3×8 block a0(12), a1(12), a2(12) is convolved with a 3×8 block b0(12), b1(12), b2(12) by block 580 to yield a 3×8 convolved output C0(12), c1(12), c2(12). The 3×8 blocks a0(4..11), a1(4..11), a2(4..11) and b0(4..11), b1(4..11), b2(4..11) are convolved by respective blocks (not shown) to yield respective 3×8 convolved outputs c0(4..11), c1(4..11), c2(4..11). Parallel cyclic convolution circuit 540 merges the resulting convolved outputs c0(1..12), c1(1..12), a2(1..12) to yield a cyclic output 545 (c′).


Returning to FIG. 5a and assuming the width of register vector 535 and register vector 536 is 255, transformation circuit 520 and transformation circuit 521 each multiply their respective inputs considered as vectors over GF(2) by a binary matrix (T). Cyclic output 545 is provided to a register circuit 552 which stores the 255-bit vector as a vector output 555. Vector output 555 is provided to an inverse transformation circuit 562 that reverses the transformation applied by transformation circuit 520 and transformation circuit 521. Inverse transformation circuit 560 multiplies vector output 555 over GF(2) by a binary matrix (T−1). Such multiplications by transformation circuit 520, transformation circuit 521, and inverse transformation circuit 562 may be implemented using XOR gates as is known in the art.


In order to define the matrices T and T−1 the following 3'33 block matrix (TF) with 85 bits per column is defined:







(




T
85



0


0




0



T
85



0




0


0



T
85




)

,




where T85 is itself an 85×85 matrix by the following row permutations: for all i=1 to 255 move row number 1+85((i−1)mod3)+(i−1)mod 85 to the place number i. The transformation matrix T is then obtained from TF by removing the last 127 columns. Using the notation indicating that TF−1 is the inverse of TF, and ri is the ith row of TF−1, then the inverse matrix T−1 is obtained as follows:







T

-
1


=


(





r
1

+

r
129








r
2

+

r
130















r
127

+

r
255







r
128




)

.





The aforementioned T85 matrix is obtained by factoring the polynomial x85+1 to irreducible factors (i.e., primes) over GF(2):






x
85+1=f(1)(x) . . . f(12)(x),


where






f
(1)(x)=x+1,






f
(2)(x)=x4+x3+x2+1,






f
(3)(x)=x8+x7+x6+x4+x2+x+1,






f
(4)(x)=x8+x7+x5+x+1,






f
(5)(x)=x8+x7+x3+x+1,






f
(6)(x)=x8+x5+x4+x3+1,






f
(7)(x)=x8+x5+x4+x3+x2+x+1,






f
(8)(x)=x8+x6+x5+x4+x2+x+1,






f
(9)(x)=x8+x6+x5+x4+x3+x+1,






f
(10)(x)=x8+x7+x6+x4+x3+x2+1,






f
(11)(x)=x8+x7+x5+x4+x3+x2+1, and






f
(12)(x)=x8+x7+x6+x5+x4+x3+1.


Let di=deg f(i)(x) for i=1 to 12, di×85 matrix Ti such that its jth column is equal to (c0, . . . , cdi−1)T, where:


c0+c1x+ . . . +cdi−1xdi−1=xj−1mod f(i)(x); for i=1 to 12, j=1 to 85. Each irreducible polynomial f(i)(x) defines the finite field F(i)=GF(2)[x]/(f(i)(x)) of polynomials over GF(2) modulo f(i0(x). The field F(1) is isomorphic to the field GF(2), the field F(2) is isomorphic to the field GF(24) defined by the irreducible polynomial x4+x+1, the fields F(2), . . . , F(12) are isomorphic to the field GF(28) defined by the irreducible polynomial x8+x4+x3+x+1. Let Bi be the di×di transition matrix from the field F(i) to the corresponding isomorphic field. It means that if a binary column vector a representing an element from the field F(i) then the vector Bia represents the corresponding element in the isomorphic field. Then the matrix T85 can be calculated by the following formula:







T
85

=


(





B
1



T
1













B
12



T
12





)

.





The resulting matrix T85 is as follows: 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111100011000110001100011000110001100011000110001100011000110001100011000110001100011000100011000110001100011000110001100011000110001100011000110001100011000110001100011000110010100101001010010100101001010010100101001010010100101001010010100101001010010100101011110111101111011110111101111011110111101111011110111101111011110111101111011110111111110100110110010111101001101100101111010011011001011110100110110010111101001101100100100111011011100101001110110111001010011101101110010100111011011100101001110110111001010110000000110100101100000001101001011000000011010010110000000110100101100000001101001110011001110000011100110011100000111001100111000001110011001110000011100110011100000011110001000001000111100010000010001111000100000100011110001000001000111100010000010011011001011110100110110010111101001101100101111010011011001011110100110110010111101000000010111011101000000101110111010000001011101110100000010111011101000000101110111010000010001111000100000100011110001000001000111100010000010001111000100000100011110001110111101101000110000011010010001000111110111010111000100100101011111111010011111100101110101110001001001010111111110100111111001110111101101000110000011010010001000111110100111000011110101001011001011110001110011010101011000101010001011010110110000100000010001011010110110000100000010011100001111010100101100101111000111001101010101100010100110010000110011110010011011011111000000011101000001010011000110111001010000101110110100011000001101001000100011111011101011100010010010101111111101001111110011101111011010101000101101011011000010000001001110000111101010010110010111100011100110101010110000101110110011001000011001111001001101101111100000001110100000101001100011011100101001011101111100010001001011000001100010110111101110011111100101111111101010010010001110011000001100010110111101110011111100101111111101010010010001110101110111110001000100100010011001101110100001010011101100011001010000010111000000011111011011001001111001100000111110110110010011110011000010011001101110100001010011101100011001010000010111000011110000111001000000100001101101011010001010100011010101011001110001111010011010010101110101110111110001000100101100000110001011011110111001111110010111111110101001001000011111011011001001111001100001001100110111010000101001110110001100101000001011100000000011100100000010000110110101101000101010001101010101100111000111101001101001010111110001111111100010100011111111000101000111111110001010001111111100010100011111111000100001111011011110000011110110111100000111101101111000001111011011110000011110110111100000101010100001100001010101000011000010101010000110000101010100001100001010101000011001001100011001011010011000110010110100110001100101101001100011001011010011000110010110001000110110001000010001101100010000100011011000100001000110110001000010001101100010000111111110001010001111111100010100011111111000101000111111110001010001111111100010100100010010111101001000100101111010010001001011110100100010010111101001000100101111010000100011011000100001000110110001000010001101100010000100011011000100001000110110001111100010110011110000010111010000101010011011110110110000100011101110110010111110111101011111011111111000101100111100000101110100001010100110111101101100001000111011101100110010011001101011100001100000001001110101000100001110011100011111101011000110110100000110000000100111010100010000111001110001111110101100011011010001100100110011010111000100100000111101001011011100101011010101011110010001010000001101001111100110001001010001010100110111101101100001000111011101100101111101111111100010110011110000010111010000100001110011100011111101011000110110100011001001100110101110000110000000100111010100000011010011111001100010010100100100000111101001011011100101011010101011110010001011110110110001011100111110000111111011110100000011101011101100111011100010010101010001010100011110110110001011100111110000111111011110100000011101011101100111011100010010101110100101111000110011000010100100100001101000101011111010101101011000000010110010110101001001000011010001010111110101011010110000000101100101101110100101111000110011000001001111001101010011001001101111111100100011011010011100101000010001000001100011100000011111101111010000001110101110110011101110001001010101000111101101100010111001111100010000110100010101111101010110101100000001011001011011101001011110001100110000101001000010001000001100011100000100111100110101001100100110111111110010001101101001110010110111001110101101100101001111011010111101000110111100011100101100010010010111010011010110101111010001101111000111001011000100100101110100110110111001110101101100101001111001100110100001101010101110001011111111001001110000000101101001000000111011101100000100110101010111000101111111100100111000000010110100100000011101110110000010011001101000001000010001111001100011001000101010010101100111111011111010100000110000111110000101001110101101100101001111011010111101000110111100011100101100010010010111010011011011101000000111011101100000100110011010000110101010111000101111111100100111000000010110100010100010000100011110011000110010001010100101011001111110111110101000001100001111100100000010111101111110000111110011101000110110111100010101010010001110111001101110101100010111101111110000111110011101000110110111100010101010010001110111001101110101110000110110001001111111101100100110010101100111100100000111000110000010001000010100111001001010011100101101100010011111111011001001100101011001111001000001110001100000100010000110100000001101011010101111101010001011000010010010100001100110001111010010111011010111010001101101111000101010100100011101110011011101011100000010111101111110000111110001100101011001111001000001110001100000100010000101001110010110110001001111111101100100011001100011110100101110110100110100000001101011010101111101010001011000010010010101000110100111000111101100010111101011011110010100110110101110011101101100101110100100010111001110110110010111010010010001101001110001111011000101111010110111100101001101101001011010000000111001001111111101000111010101011000010110011001000001101110111000000111010101011000010110011001000001101110111000000100101101000000011100100111111110100001010000111110000110000010101111101111110011010100101010001001100011001111000100001001101100101110100100100011010011100011110110001011110101101111001010011011010111001110000001110010011111111010001110101010110000101100110010000011011101110000001001011010000001010111110111111001101010010101000100110001100111100010000100010100001111100001110000110110111101100101010000101110100000111100110100011111111011111010011011101110000101010000101110100000111100110100011111111011111010011011101110001000011011011110110010010100100011001111100101100000010100010011110101010110101001110110100101111000001000101001000110011111001011000000101000100111101010101101010011101101001011110000010010011100111000010001010111001000000011000011101011001100100110001011011000110101111110011001010100001011101000001111001101000111111110111110100110111011100010000110110111100100100101001000110011111001011000000101000100111101010101101010011101101001011110000001101011111100011100111000010001010111001000000011000011101011001100100110001011011


It should be noted that the various blocks discussed in the above application may be implemented in integrated circuits along with other functionality. Such integrated circuits may include all of the functions of a given block, system or circuit, or a subset of the block, system or circuit. Further, elements of the blocks, systems or circuits may be implemented across multiple integrated circuits. Such integrated circuits may be any type of integrated circuit known in the art including, but are not limited to, a monolithic integrated circuit, a flip chip integrated circuit, a multichip module integrated circuit, and/or a mixed signal integrated circuit. It should also be noted that various functions of the blocks, systems or circuits discussed herein may be implemented in either software or firmware. In some such cases, the entire system, block or circuit may be implemented using its software or firmware equivalent. In other cases, the one part of a given system, block or circuit may be implemented in software or firmware, while other parts are implemented in hardware.


In conclusion, the invention provides novel systems, devices, methods and arrangements for data processing. While detailed descriptions of one or more embodiments of the invention have been given above, various alternatives, modifications, and equivalents will be apparent to those skilled in the art without varying from the spirit of the invention. Therefore, the above description should not be taken as limiting the scope of the invention, which is defined by the appended claims

Claims
  • 1. A data processing system, the system comprising: an encoder circuit including: a cyclic convolution circuit operable to multiply a vector input derived from a user data input by a portion of a circulant matrix to yield a convolved output; andan encoded output circuit operable to generate an encoded data set corresponding to the user data input and based at least in part on the convolved output.
  • 2. The data processing system of claim 1, wherein the encoded output circuit comprises: a vector adder circuit operable to sum instances of the convolved output with instances of a cyclic convolution output to yield a corresponding instance of a vector sum; anda shift register circuit operable to shift instances of the vector sum to yield the instances of the cyclic convolution output.
  • 3. The data processing system of claim 2, wherein the encoded data set generated based at least in part on the cyclic convolution output.
  • 4. The data processing system of claim 2, wherein the number of instances of the vector sum is l, and wherein l corresponds to the number of sub-vectors into which the user data input is divided.
  • 5. The data processing system of claim 1, wherein the cyclic convolution circuit includes: a first cyclic convolution circuit; anda second cyclic convolution circuit, wherein the first cyclic convolution circuit operates in parallel with the second cyclic convolution circuit, and wherein the first cyclic convolution circuit operates on a first portion of the vector input and the second cyclic convolution circuit operates on a second portion of the vector input.
  • 6. The data processing system of claim 5, wherein the first portion of the vector input is a 3×1 portion of the vector input, and wherein the second portion of the vector input is a 3×4 portion of the vector input.
  • 7. The data processing system of claim 5, wherein the first portion of the vector input is a 3×4 portion of the vector input, and wherein the second portion of the vector input is a 3×8 portion of the vector input.
  • 8. The data processing system of claim 1, wherein the system further comprises: a transformation circuit operable to transform a first number of bits of the user data input into a second number of bits of the vector input.
  • 9. The data processing system of claim 8, wherein the first number of bits is 128, and wherein the second number of bits is 255.
  • 10. The data processing system of claim 8, wherein the cyclic convolution circuit includes: a first cyclic convolution circuit;a second cyclic convolution circuit, wherein the first cyclic convolution circuit operates in parallel with the second cyclic convolution circuit, and wherein the first cyclic convolution circuit operates on a first portion of the vector input to yield a first sub-output and the second cyclic convolution circuit operates on a second portion of the vector input to yield a second sub-output; anda combining circuit operable to combine at least the first sub-output and the second sub-output to yield a non-transformed output.
  • 11. The data processing system of claim 11, wherein the system further includes: an inverse transformation circuit operable transform the second number of bits of the non-transformed output to the first number of bits of a cyclic convolution output.
  • 12. The data processing system of claim 1, wherein the data processing system is implemented as part of a device selected from a group consisting of: a storage device, and a communication device.
  • 13. The data processing system of claim 1, wherein the data processing system is implemented as part of an integrated circuit.
  • 14. A method for data encoding, the method comprising: receiving a user data input;using a cyclic convolution circuit to multiply a vector input derived from a user data input by a portion of a circulant matrix to yield a convolved output; andgenerating an encoded data set corresponding to the user data input and based at least in part on the convolved output.
  • 15. The method of claim 14, the method further comprising: transforming a first number of bits of the user data input into a second number of bits to yield the vector input.
  • 16. The method of claim 15, wherein the first number of bits is 128, and wherein the second number of bits is 255.
  • 17. The method of claim 14, wherein the cyclic convolution circuit includes: a first cyclic convolution circuit; anda second cyclic convolution circuit, wherein the first cyclic convolution circuit operates in parallel with the second cyclic convolution circuit, and wherein the first cyclic convolution circuit operates on a first portion of the vector input and the second cyclic convolution circuit operates on a second portion of the vector input.
  • 18. The method of claim 17, wherein the first portion of the vector input is a 3×1 portion of the vector input, and wherein the second portion of the vector input is selected from a group consisting of: a 3×4 portion of the vector input, and a 3×8 portion of the vector input.
  • 19. The method of claim 14, wherein the method further comprises: adding instances of the convolved output with instances of a cyclic convolution output to yield a corresponding instance of a vector sum; andshifting instances of the vector sum to yield the instances of the cyclic convolution output.
  • 20. A data storage device, the device comprising: a storage medium;a head disposed in relation to the storage medium and operable to write an encoded data set to the storage medium;an encoder circuit including: a cyclic convolution circuit operable to multiply a vector input derived from a user data input by a portion of a circulant matrix to yield a convolved output; andan encoded output circuit operable to generate the encoded data set corresponding to the user data input and based at least in part on the convolved output.
Priority Claims (1)
Number Date Country Kind
2014104571/08 Feb 2014 RU national