The present invention relates to the field of circuits for shifting data, and more particularly, the present invention relates to barrel rotators.
A barrel shifter is a digital circuit that can shift a data word or an ordered list of elements by a specified number of bits or positions in a single clock cycle. A sub-category of barrel shifters is the barrel rotators, which are circuits that perform circular-shift operations. A circular, or cyclic, shift is the operation of rearranging the entries in an ordered list of elements, either by moving the final entry to the first position, while shifting all other entries to the next position, or by performing the inverse operation. The result of repeatedly applying s circular shifts to a given set of entries is called a circular shift by s positions. Shifting and rotating data is required in several applications including arithmetic operations, variable-length coding, and bit-indexing. Barrel rotators are often utilized by embedded digital signal processors and general-purpose processors to manipulate data.
Barrel rotators are widely used in Low Density Parity Check (LDPC) decoders. LDPC codes are a subcategory of linear block error correction codes characterized by a sparse parity check matrix. This means that the parity check matrix consists mainly of 0's and a relatively small number of 1's. LDPC codes were first introduced in the 1960's but have more recently received increased attention. This is due at least in part to inherent parallelism in decoding which makes LDPC codes suitable for hardware implementation and due to flexibility in designing LDPC codes, which allows LDPC codes to be used in a variety of applications.
A bipartite Tanner graph is a widely used way to represent a parity check matrix H. This graph consists of two sets of nodes, namely the check nodes and the variable nodes. Each row of H corresponds to a parity check equation, graphically represented as a check node of the Tanner graph, while columns correspond to the codeword bits, graphically represented as variable nodes. An ace in the H matrix indicates a connection between the corresponding variable and check nodes. Message passing algorithms for decoding LDPC codes operate by iteratively passing information along the edges of the Tanner graph. In a sense, the variable nodes correspond to bits of a received word—both message and parity—while check nodes correspond to parity check equations.
Decoding of LDPC codes often requires shift or shuffle operations to route information between processing elements or to/from memories. This is particularly true for some kinds of LDPC codes, including Quasi-Cyclic LDPC (QC-LDPC) codes. QC-LDPC codes are a subcategory of LDPC codes characterized by parity check matrices comprised of square sub-matrices. Each of these sub-matrices is either a z×z zero sub-matrix or a z×z right circularly shifted identity sub-matrix.
QC-LDPC codes are widely used in error correction systems due to resulting lower hardware complexity and comparable performance to randomly constructed codes. The particular structure of the QC-LDPC codes ensures that there is at most one unique ace in every column of the z×z square sub-matrices which compose these codes. This allows the parallel processing of up to z lines of the parity check matrix without data conflicts. As mentioned before and depicted in
Telecommunication standards, such as WiMAX and WiFi, support a variety of codes. An LDPC decoder architecture suitable for the 802.11n/ac IEEE standards would ideally support three codeword lengths (648, 1296, 1944) and four coding rates (1/2, 2/3, 3/4, 5/6) in order to implement the twelve LDPC parity check matrices defined by the standard. To accommodate multiple codes of different characteristics, such as length, rate, and check degrees of each parity check matrix, there is a need for a reconfigurable interconnection network that efficiently realizes connectivity between variable and check node processors of a decoder.
The present invention is directed toward reconfigurable barrel shifters and rotators. A barrel shifter comprises an array of multiplexers, the array having a plurality of inputs and a plurality of outputs and wherein the array of multiplexers is configured to rotate a set of n input messages applied to the inputs by a selected number of positions at the outputs and wherein the number n of messages contained in the set is selectable from among a plurality of values, by changing only select control signal inputs to the array of multiplexers.
The present invention is described with respect to particular exemplary embodiments thereof and reference is accordingly made to the drawings in which:
The present invention is directed toward a reconfigurable barrel rotator circuit. Embodiments of present invention are particularly useful for use in QC-LDPC decoders and, more particularly, for use in interconnection networks for QC-LDPC decoders. It will be apparent that embodiments of the present invention can be applied in other contexts where shifting and rotating data is useful, such as arithmetic operations, variable-length coding, and bit-indexing.
The modulator 108 can then prepare the codewords for transmission by modulating one or more carrier signals in accordance with the codewords. As an example, the modulation can be performed in accordance with orthogonal frequency division multiplexing (OFDM). Each modulated and encoded signal can then be transmitted via a communication channel 110. The channel 110 can be, for example, a wireless communication channel which can be, for example, part of a wireless local area network (WLAN).
A receiver 112 receives the transmitted signal from the channel 110. The receiver 112 can include a demodulator 114 and a decoder 116. The demodulator 114 demodulates the received signal as to reconstruct the codewords. The codewords can then be decoded by the decoder 116 in order to reconstruct the original data 102. While the decoder 116 can correct certain errors introduced by the communication process, the data 118 output from the decoder 112 can differ from the original data 102 due to uncorrected errors that remain.
A control unit 208 controls operation of the decoding unit 200. For this purpose, the control unit 208 includes a VPUs controller 210, a shifter controller 212 and a CPUs controller 214. The VPUs controller 210 controls operation of the VPUs of the variable node processing unit 202. The shifter controller 212 controls operation of the permutation network 204 and CPUs controller 214 controls operation of the CPUs of the check node processing unit 206.
Elements of the decoding unit 200 are implemented with hardware circuitry, which can include memory, registers, logic circuitry, general-purpose and/or specialized processors, machine-readable software, application specific integrated circuits (ASICs), programmable logic arrays (PLAs), and so forth. It will be apparent that the particular arrangement of the decoding unit 200 is exemplary and that embodiments of the present invention can be employed in conjunction with other decoder arrangements and architectures.
Sum-product message passing and its approximations, such as min-sum, normalized min-sum and offset min-sum algorithms, can be performed by the decoding unit 200. In this case, probabilistic information, e.g. in the form of log-likelihood ratios (LLRs), can be passed between the VPUs and the CPUs by the permutation network 204. The decoding unit 200 can be initialized with LLRs that pertain to bits of a codeword received from the communication channel 110. Decoding is performed through an iterative process of information exchange between VPUs and CPUs. This procedure is called belief propagation. In the traditional fully parallel, two-phased decoding scheduling the calculation of all check nodes follows the calculation of all the variable nodes and vice-versa. For each half-iteration, the VPUs take inputs from the CPUs and compute outputs for the CPUs. For the next half-iteration, the CPUs take inputs from the VPUs and compute outputs for the VPUs. These iterations can be repeated until an estimated codeword is found or some other stopping criterion is reached. In layered decoding the rows of the parity check matrix H get processed in subsets, or layers, in successive order. Each one of the layered decoding iterations is divided into a sequence of sub-iterations. For each half-sub-iteration, the VPUs take inputs from the CPUs and compute outputs for the CPUs. For the next half-sub-iteration, the CPUs take inputs from the VPUs and compute outputs for the VPUs.
For the case of QC-LDPC codes, the permutation network 204 should be a circuit able to circularly shift, i.e., to rotate its inputs. Such a circuit realizes the connectivity determined by a circularly shifted identity sub-matrix Hx, where the inputs of the circuit correspond to the columns of Hx, the outputs correspond to the rows of Hx, and an ace in column h and row g indicates a connection between the input h and output g of the circuit. With reference to
An LDPC decoder architecture suitable for the 802.11n/ac IEEE standards would ideally support three codeword lengths (648, 1296, 1944) and four coding rates (1/2, 2/3, 3/4, 5/6) in order to implement the twelve LDPC parity check matrices defined by the standard. To accommodate multiple codes of different characteristics, such as length, rate, and check degrees of each parity check matrix, there is a need for a reconfigurable interconnection network that efficiently realizes connectivity between variable and check node processors of a decoder. The challenge lies in the fact that the same circuit should be capable of interconnecting a variable number n of variable node processing elements with n check node processing elements, realizing the aforementioned cyclic-shift operations.
As shown in
In addition to the primary multiplexers described above, the barrel rotator circuit of
We define F to be a set of n input messages, s to be the rotate factor (s ε{0, 1, . . . , n−1}), r to be a binary value which indicates left or right rotation, where r=0 for left rotation and r=1 for right rotation, and G to be the set of the rotated output messages. An n-message rotate-left by s operation performs an n-message left rotation, setting the i-th message of F to the l-th position of G, where i, lε{0, 1, . . . , n−1} and l=mod((i−s), n). An n-message rotate-right by s operation performs an n-message right rotation, setting the i-th message of F to the l-th position of G, where i, lε{0, 1, . . . , n−1} and l=mod((i+s), n). For example, assuming a set F of n=9 messages, where F=[m0, m1, m2, m3, m4, m5, m6, m7, m8], s=5 and r=0, the result of the rotate-left operation in F is the set G=[m5, m6, m7, m8, m0, m1, m2, m3, m4]. A rotate-right by s=5 operation would give the set G=[m4, m5, m6, m7, m8, m0, m1, m2, m3].
The barrel rotator 300 is reconfigurable to circularly shift different numbers n of input messages. Thus, the value of n can be changed by simply changing the values of the select logic control signals to the multiplexers. The embodiment shown in
The barrel rotators of
A rule that defines the interconnection of the multiplexers for the fixed barrel rotators is as follows:
Therefore, for example, referring to
Referring to
The reconfigurable barrel rotator circuit supports circular shifting of a set of n input messages applied to the array of inputs by a selected number of positions at the array of outputs and the number n of messages contained in the set is selectable from among a set D of d values where D=[z0, z1, . . . , zd-1].
In general, a reconfigurable permutation network able to support all these d configurations consists of a zd-1×zd-1 barrel rotator and a plurality of additional multiplexers, called secondary. The number of primary multiplexers that compose the zd-1×zd-1 barrel rotator is divided into d groups, namely Group 0, Group1 . . . Group d−1. The aforementioned secondary multiplexers are placed at one of the inputs of all the primary multiplexers of the zd-1×zd-1 barrel rotator which belong to Group f, where fε{0, 1, . . . , d−2}, as follows. Group f consists of the primary multiplexers zf−2k to zf−1 of stage k, where kε{0, 1, . . . , pf−1} and pf=┌log2zf┐. A Group f primary multiplexer j (j ε{0, 1, . . . , zf−1}) of stage k (kε{0, 1, . . . , pf−1} and pf=┌log2zf┐) receives as inputs the outputs of the previous stage primary multiplexers (or the primary inputs in case of k=0) j and mod (j+t, zd-1), when the desired configuration is not zf, otherwise it receives as inputs the outputs of the previous stage primary multiplexers (or the primary inputs in case of k=0) j and mod (j+t, zf). In this manner a secondary multiplexer is placed before one of the inputs of all the Group f multiplexers of the zd-1×zd-1 barrel rotator. All the secondary multiplexers of Group f are controlled by the signal S2f. Given a set D of d configurations [z0, z1, . . . , zd-1], d−1 select inputs S2f are required, where fε{0, 1, . . . , d−2}. In order to realize the zq×zq configuration, where qε{0, 1, . . . , d−1}, it must be S2f=1 when f=q, otherwise S2f=0.
As an exemplary embodiment, the barrel rotator 300 of
The embodiment depicted in
Thus, referring to
Group 1 includes MUX50, MUX41, MUX51, MUX22, MUX32, MUX42, and MUX52. In general, Group 1 consists of the primary multiplexers z1−2k to z1−1 of stage k, where kε{0, 1, . . . , p1−1} and p1=┌log2z1┐.
The remainder of the primary multiplexers is included in Group 2.
For the primary multiplexers of Group 2, the same interconnection rule is used as in the single-mode case. Specifically, a Group 2 primary multiplexer j (jε{0, 1, . . . , z2−1}) of stage k (kε{0, 1, . . . , p2−1} and p2=┌log2z2┐) receives as inputs the outputs of the previous stage primary multiplexers (or the primary inputs in case of k=0) j and mod(j+t, z2), where t=2k.
A Group 0 primary multiplexer j (jε{0, 1, . . . , z0−1}) of stage k (kε{0, 1, . . . , p0−1} and p0=┌log2z0┐) receives as inputs the outputs of the previous stage primary multiplexers (or the primary inputs in case of k=0) j and mod (j+t, z2), when the desired configuration z is not equal to z0, otherwise it receives as inputs the outputs of the previous stage primary multiplexers (or the primary inputs in case of k=0) j and mod (j+t, z0). This rule implies the existence of some additional multiplexers, placed at one of the inputs of all the primary multiplexers of Group 0. These additional multiplexers are the secondary multiplexers 302, 304 and 306, depicted in
A Group 1 primary multiplexer j (jε{0, 1, . . . , z1−1}) of stage k (kε{0, 1, . . . , p1−1} and p1=┌log2z1┐) receives as inputs the outputs of the previous stage primary multiplexers (or the primary inputs in case of k=0) j and mod (j+t, z2), when the desired configuration z is not equal to z1, otherwise it receives as inputs the outputs of the previous stage primary multiplexers (or the primary inputs in case of k=0) j and mod (j+t, z1). Respectively, this rule implies the existence of some additional multiplexers, placed at one of the inputs of all the primary multiplexers of Group 1. These additional multiplexers are the secondary multiplexers 308, 310, 312, 314, 316, 318 and 320. Secondary multiplexers 308, 310, 312, 314, 316, 318 and 320 are placed in one of the inputs of the primary multiplexers MUX50, MUX41, MUX51, MUX22, MUX32, MUX42, and MUX52, respectively. These secondary multiplexers receive the control signal S21.
Referring to
In another example where n=6, we assume a rotate factor s equal to 2 and r=0. It is b=|(n·r)−s|=|(6·0)−2|=2. For this 6×6 configuration, it is S20=0 and S21=1. The 4-bit binary representation of 2 is 0010. We assign the binary value bk to the input S1k, realizing the desired connectivity. Select signal S10 is equal to 0, so MUX00 allows input X10 to pass through, unlike MUX80. Select signal S11 is equal to 1, S20=0 and S21=1, so the output of MUX00 passes through MUX41 and MUX71, unlike MUX01 and 304. Thereafter, the output of MUX41 passes through MUX42 and MUX43, since S12 and S13 are equal to 0. Similarly, the output of MUX71 passes through MUX72 and MUX73. Therefore, the input X10 is connected to the outputs X24 and X27 for this particular configuration. Respectively, input X11 is connected to outputs X25 and X28, input X12 is connected to output X20, input X13 is connected to output X21, input X14 is connected to output X22, input X15 is connected to output X23, input X16 is not connected to any output, input X17 is not connected to any output and input X18 is connected to output X26. Since this is a 6×6 configuration, the inputs X16, X17, X18 and the outputs X26, X27, X28 are not used. With respect to the aforementioned notation it is F=[X10, X11, X12, X13, X14, X15] and G=[X12, X13, X14, X15, X10, X11], which is the desired connectivity. In case where r=1, the procedure is similar (b=4) and the performed operation is rotate-right.
In another example where n=3, we assume a rotate factor s equal to 2 and r=1. It is b=|(n·r)−s|=|(3·1)−2|=1. For this 3×3 configuration, it is S20=1 and S21=0. The 4-bit binary representation of 2 is 0001. We assign the binary value bk to the input S1k, realizing the desired connectivity. Select signal S10 is equal to 1, S20=1 and S21=0 so 302, MUX20 and MUX80 allow input X10 to pass through, unlike MUX00 and 308. Thereafter, the output of MUX20 passes through MUX21, MUX22 and MUX23, since S11, S12 and S13 are equal to 0. Similarly, the output of MUX80 passes through MUX81, MUX82 and MUX83. Therefore, the input X10 is connected to the outputs X22 and X28. Respectively, input X11 is connected to output X20 and input X12 is connected to output X21. Since this is a 3×3 configuration, the inputs X13, X14, X15, X16, X17, X18 and the outputs X23, X24, X25, X26, X27, X28 are not used. With respect to the aforementioned notation it is F=[X10, X11, X12] and G=[X11, X12, X10], which is the desired connectivity for the rotate-right by 2 operation.
Thus, the barrel rotator 300 of
Generally, for a z×z reconfigurable barrel rotator embodiment, in the case of an n×n configuration (n≦z) the (┌log2z┐−┌log2n┐) most significant bits of the binary representation of b are equal to zero. This is because the first ┌log2n┐ stages realize the desired shifting operation and the remaining (┌log2z┐−┌log2n┐) stages are essentially bypassed.
In another exemplary embodiment, the barrel rotator of
The barrel rotator 1100 is reconfigurable to: (1) circularly shift a group of three input messages (i.e., n=3); (2) circularly shift a group of four input messages (i.e., n=4); and (3) circularly shift a group of nine input messages (i.e., n=9).
Referring to
Embodiments of the present invention can be extended to any set D of d values, D=[z0, z1, . . . , zd-1], where 2≦z0<z1< . . . <zd-1.
The aforementioned technique for implementing reconfigurable barrel rotators ensures that every primary multiplexer belongs to at most one group for any set D of d values, D=[z0, z1, . . . , zd-1], where 2≦z0<z1< . . . . <zd-1, under the following condition:
zk≧zk-1+2(
For example, assuming d=3 and z0=3 (z1=zd-2) the following condition is derived: z1≧z02(
In the case that condition (1) is not met, there exist primary multiplexers that belong to two or more groups. These multiplexers are part of the stage (┌log2zk-1┐−1) if and only if condition (1) is violated for zk. The violation of condition (1) may increase the maximum delay critical path of the barrel rotator circuit.
Assuming D=[z0, z1, . . . , zd-1], we define Wjk to be a metric associated with each primary multiplexer MUXjk. We define Wjk=1 when the primary multiplexer MUXjk belongs to Group d−1, and Wjk=┌log2(w+1)┐+1, when the primary multiplexer MUXjk belongs to a number of w Groups other than Group d−1. This is the delay, measured in terms of multiplexer count, introduced by the primary multiplexer MUXjk and its secondary multiplexers. For example, in the embodiment of
The upper bound CLBR of the number of multiplexers (primary and secondary) that comprise the critical path of the reconfigurable barrel rotator is given by the following equation:
CLBR=max(Wj0)+max(Wj1)+ . . . +max(Wjp), (2)
where jε{0, 1, . . . , zd-1−1} and p=┌log2zd-1┐.
In the case that condition (1) is not met for at least one multiplexer of stage k, value max(Wjk), which represents the maximum delay of this specific stage is increased, with a consequent increase of the CLBR value. Although there is not a one-to-one relationship between the critical path delay and the CLBR value, CLBR gives an upper bound of this delay. By choosing values of zk that satisfy all conditions derived by (1) the maximum delay critical path through the reconfigurable barrel rotator is minimized.
In general, the number Ms of the additional secondary multiplexers required for a reconfigurable barrel rotator embodiment, for a set D of d values [z0, z1, . . . , zd-1], is given by the following equation:
Ms=(2┌log2z0┐−1)+(2┌log2z1┐−1)+ . . . +(2┌log2zd-2┐−1) (3)
Equation (3) gives a number of Ms=(22−1)+(23−1)=3+7=10 secondary multiplexers for the case of the barrel rotator 300 circuit and a number of Ms=(22−1)+(22−1)=3+3=6 secondary multiplexers for the case of the barrel rotator 1100 embodiment. These values are verified by the circuits shown in
In the case of the 802.11n/ac IEEE standards the number of inputs/outputs of the permutation network should be 27, 54 or 81. The transition from an embodiment of the aforementioned architecture with D=[3, 6, 9] to an embodiment with D=[27, 54, 81] is straightforward, particularly considering that condition (1) is met for the [27, 54 and 81] configuration.
While the reconfigurable barrel rotators 300 and 1100 are shown comprising an array of multiplexers, the barrel rotators 300 and 1100 can be implemented as application specific integrated circuits (ASICs), programmable logic arrays (PLAs), an arrangement of discrete components or other hardware devices.
The present invention can be extended to embodiments able to perform other shift operations as well, like logical shift and arithmetic shift.
The foregoing detailed description of the present invention is provided for the purposes of illustration and is not intended to be exhaustive or to limit the invention to the embodiments disclosed. Accordingly, the scope of the present invention is defined by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5278703 | Rub et al. | Jan 1994 | A |
5278846 | Okayama et al. | Jan 1994 | A |
5317472 | Schweitzer, III | May 1994 | A |
5325402 | Ushirokawa | Jun 1994 | A |
5392299 | Rhines et al. | Feb 1995 | A |
20060256670 | Park et al. | Nov 2006 | A1 |
20080069373 | Jiang et al. | Mar 2008 | A1 |
20080132893 | D'Amelio et al. | Jun 2008 | A1 |
20080243974 | Paumier et al. | Oct 2008 | A1 |
20110164669 | Mathew et al. | Jul 2011 | A1 |
20110264987 | Li et al. | Oct 2011 | A1 |
20120226281 | Sackett et al. | Sep 2012 | A1 |
Entry |
---|
M. R. Pillmeier, M. J. Schulte and E. G. Walters III, “Design alternatives for barrel shifters”, in Proc. SPIE 4791, Advanced Signal Processing Algorithms, Architectures, and Implementations XII, 436 , Dec. 2002. |
I. Tsatsaragkos and V. Paliouras, “A Flexible Layered LDPC Decoder,” in Proc. of 8th International Symposium on Wireless Communication Systems (ISWC), IEEE, pp. 36-40, 2011. |
M. Awais and C. Condo, “Flexible LDPC decoder architectures,” in VLSI Design, vol. 2012, pp. 1-16, 2012. |