The present invention relates to a data shifter and a control method thereof, a multiplexer, a data sifter, and a data sorter, and in particular to, but not limited to, a data spreading shifter and a data stuffing shifter.
The required processing speed of digital circuits is increasing year by year. However, improvements in the clock frequency of baseband chips have been slower than increases in the required processing speed. Moreover, parallel processing techniques for baseband chips have been studied in order to improve their processing speed.
Vector processing is a key technique for realizing parallel processing. Insertion and removal of data elements depending upon mask bits play an important role in the implementation of vector processing.
GB 2 370 384 A discloses an N-bit shifter which receives as its input a sequence of N bits x0 . . . XN-1 and gives as its output a plurality of bits z0 . . . ZN-1 representing a selected permutation transposition or rearrangement of the input bits. This shifter can be constructed with circuit size of O(N log N), and can perform the data spreading/stuffing shift in O(log N) steps.
The shifter of GB 2 370 384 A includes a memory and N one-bit slices of the multiplexers. First, N-bits of input data are stored into the memory. Next, each slice receives one single bit of data stored in a memory area corresponding to the slice and at least one bit of data stored in other memory areas as the input, and selects any one of the input bit data in accordance with a selection signal. More specifically, for 0≦i<N, the slice #i receives one bit of data stored in the memory area #i, which corresponds to the ith slice, and bit data stored in the memory area #(i±2k) (k: nonnegative integer), and then selects and outputs any one of the input bit data in accordance with the selection signal. For each processing cycle, the N slices perform such operations respectively, and then N bit data output by the N slices are stored in the memory. Then, the N slices perform similar operations on the stored N bit data repeatedly until a desired permutation transposition or rearrangement of the input bit data is achieved.
GB 2 370 384 A discloses an embodiment of the shifter that operates as a data stuffing shifter where for k=0, 1, . . . , (log2 N)−1 and for i=0, . . . , N−1, at the (k+1)th processing cycle, the slice #i selects and outputs a bit data stored in the memory area #i, which corresponds to the slice #i, or bit data stored in the memory area #(i±2k). This shifter requires only O(log N) processing steps, and the circuit size is O(N log N). GB 2 370 384 A also discloses an embodiment of the shifter as a data spreading shifter with O(log N) processing steps based on a similar idea. In addition, GB 2 370 384 A discloses the possibility of constructing a cascade of O(log N) pluralities of N slices, which allows a “select” to be carried out in one single step.
The data spreading/stuffing shifter described in GB 2 370 384 A requires input of a selection signal into each slice every processing cycle. However, it would be burdensome to determine proper selection signals to be input into the slices for each processing cycle. This is because the shifter of GB 2 370 384 A repeatedly performs bit selection at each slice, writes the selected bits into the memory, and performs the bit selection on the bits stored in the memory again.
Therefore, the processing load during the determination of the proper selection signals can become a “bottleneck” in a series of signal processing. GB 2 370 384 A also discloses a cascade of slices to improve the processing speed. However, a simple implementation of the cascade requires a large processing circuit of size O(N log2 N).
Accordingly, the present invention provides a technology for achieving a fast, easily controlled data spreading/stuffing shifter implementable with small circuit size.
According to one aspect of the present invention, a data shifter that performs data shift operations on N-lane data sequences is provided. The data shifter includes a plurality of stages each of which includes N elemental units. The mth elemental unit, which is included in the pth stage, is preliminarily assigned a predetermined one-bit value c and a positive integer q, and includes
means for inputting target data to be processed whose size is greater than or equal to one bit;
means for inputting destination data representing a lane number of a lane where Data(p,m), a logical OR of the input target data, should be routed to, the size of the destination data being ┌log2 N┐ bit(s);
means for comparing the qth bit from the least significant bit of Des(p,m), a logical OR of the input destination data, with the one-bit value c; and
means for outputting, based on the comparison result, both one of Data(p,m) and the value 0 as the target data and one of Des(p,m) and the value 0 as the destination data bound for the mth elemental unit included in the next stage, and if m−1+2q-1<N, further outputting both the other of Data(p,m) and the value 0 as the target data and the other of Des(p,m) and the value 0 as the destination data bound for the (m+2q-1)th elemental unit included in the next stage.
The data shifter inputs both the N-lane data sequences to be processed as the target data and the destination data of each data sequence into the N elemental units included in the first stage respectively, and outputs, as shifted output data of the mth lane, a logical OR of the target data which the elemental units included in the last stage output bound for the mth elemental unit included in the next stage.
We can construct a data spreading/stuffing shifter according to the present invention, which includes a control circuit whose size is O(N log N) and which requires only O(1) processing step. Thus, the present data shifter is exceedingly efficient compared to GB 2 370 384 A. In addition, predetermined parameters are preliminarily assigned to each elemental unit, which allows easy control of the data shifter and implementation of the shifter with little effort.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Embodiments of the present invention will now be described with reference to the attached drawings. Each embodiment described below will be helpful in understanding a variety of concepts from the generic to the more specific. It should be noted that the technical scope of the present invention is defined by claims, and is not limited by each embodiment described below. In addition, not all combinations of the features described in the embodiments are always indispensable for the present invention.
The data shifter according to an embodiment of the present invention is based on a barrel shifter constructed with a number of stages of binary multiplexers. The spreading/stuffing shifter is realized by controlling each of a plurality of switches in the multiplexers.
The data stuffing shifter can be constructed as in
The above data spreading/stuffing shifter can be constructed with the circuit size of O(N log N). Note that if the order of multiplexer stages is reversed (i.e., swapping the data spreading shifter and data stuffing shifter), a collision of routing resources could occur.
The structure for data lanes of a spreading/stuffing shifter according to the embodiment of the present invention has been described above as a basic concept. Now, a description will be provided regarding how switches may be controlled and how collisions of routing resources may be avoided.
Let us assume that the number of the stages of multiplexers is M and an input lane #u is to be shifted to an output lane #v. Then, the difference A of the input lane number and the output lane number can be represented as
Therefore, the routing of the signal can be performed by setting the switch
to bn. Here, switch #Sn(u,v) shifts input data by 2n lanes if its input data value is 1, otherwise it does not shifts and outputs the input data as it is. In other words, the switches shift their input data by 2n if bn is 1.
Mathematically, it is possible to prove that the data can be routed without any collision of routing resources when we use a certain ordering of the multiplexer stages. For this routing, it is possible to prove that the collision of routing resources will not occur for the following two routes.
a) from input lane #u to output lane #v.
b) from input lane #u+1 to output lane #v+1+a (a≧0)
Let us assume β and γ are integers and that:
u−v=2nβ+γ (0≦γ<2n). Then, it follows:
In the same manner, we can prove routing resource collisions cannot occur.
In the basic control method of switches described with reference to
Accordingly, we introduce an elemental unit 20, as depicted in
As shown in
The input circuit 22 inputs destination data representing a lane number of the lane to which Data(p,m) should be routed. The size of the destination data is ┌log2 N┐ bit(s). We represent the destination data input into the #m of elemental unit 20 in the stage #p as Destination(p,m) or Des(p,m). The input circuit 23 inputs one-bit enabler signals. When the input circuit 23 inputs a zero bit as the enabler signal, the elemental unit 20 and its subsequent elemental units are disabled. We represent the enabler signal input into the #m of elemental unit 20 in the stage #p as Enable(p,m).
Each elemental unit 20 is preliminarily assigned a predetermined one-bit value c and a nonnegative integer q. The bit length of the integer q is ┌log2┌log2 N┐┐. The elemental unit 20 compares the bit #q from the least significant bit (LSB) of Des(p,m), a logical OR of the input destination data, with the value c. Then, the elemental unit 20 outputs, based on the comparison result, both (i) one of Data(p,m) value and the value 0 as the target data and (ii) one of Des(p,m) value and the value 0 as the destination data bound for the elemental unit #m in the next stage. In addition, if m+2q<N, the elemental unit 20 further outputs both the other of Data(p,m) value and the value 0 as the target data and the other of Des(p,m) and the value 0 as the destination data bound for the elemental unit #(m+2q) in the next stage.
More specifically, the data shifter 20 according to the present embodiment includes an exclusive OR circuit 24, a plurality of AND circuits 31-38, and a plurality of output circuits 25-30. The exclusive OR circuit 24 performs the exclusive OR arithmetic operation on the bit #q of Des(p,m) value and the bit #c, and outputs the resulting bit to the AND circuit 31 and the inverted resulting bit to the AND circuit 32. The AND circuit 31 performs the AND arithmetic operation on Enable(p,m) value and the output of the exclusive OR circuit 24, and outputs the result to each of the AND circuits 33-35. Similarly, the AND circuit 32 performs the AND arithmetic operation on Enable(p,m) value and the inverse of the output of the exclusive OR circuit 24, and outputs the result to each of the AND circuits 36-38.
The AND circuit 33 performs the AND arithmetic operation on each bit of Data(p,m) and the output of the AND circuit 31, and outputs the result to the output circuit 25. Similarly, the AND circuit 34 performs the AND arithmetic operation on each bit of Des(p,m) and the output of the AND circuit 31, and outputs the result to the output circuit 26. The AND circuit 35 performs the AND arithmetic operation on each bit of Enable(p,m) and the output of the AND circuit 31, and outputs the result to the output circuit 27. Note that if m+2q<N, the output circuit 25 transfers the output of the AND circuit 33 as the target data bound for the elemental unit #(m+2q) in the next stage. If m+2q≧N, the output circuit 25 is terminated. Similarly, if m+2q<N, the output circuits 26 and 27 transfer the output of the AND circuits 34 and 35 as the destination data and the enabler signal respectively bound for the elemental unit #(m+2q) in the next stage. If m+2q≧N, the output circuits 26 and 27 are terminated.
Similar to the AND circuit 33, the AND circuit 36 performs the AND arithmetic operation on each bit of Data(p,m) and the output of the AND circuit 32, and outputs the result to the output circuit 28. Similarly, the AND circuit 37 performs the AND arithmetic operation on each bit of Des(p,m) and the output of the AND circuit 32, and outputs the result to the output circuit 29. The AND circuit 38 performs the AND arithmetic operation on each bit of Enable(p,m) and the output of the AND circuit 32, and outputs the result to the output circuit 30. The output circuit 28 transfers the output of the AND 36 circuit as the target data bound for the elemental unit #m in the next stage. Similarly, the output circuits 29 and 30 transfer the output of the AND circuits 37 and 38 as the destination data and the enabler signal respectively bound for the elemental unit #m in the next stage.
In this way, the #m of elemental unit 20 in the stage #q according to the embodiment of the present invention performs output divided into two cases depending upon whether or not the bit #q from the least significant bit of Des(p,m) matches the bit value c:
(i) if the bit #q from the least significant bit of Des(p,m) does match the value c, both Data(p,m) as the target data and Des(p,m) as the destination data are output bound for the elemental unit #m included in the next stage. If m+2q<N, the elemental unit 20 further outputs the value 0 as both the target data and the destination data bound for the elemental unit #(m+2q) included in the next stage. Otherwise, (ii) if the bit #q from the least significant bit of Des(p,m) does not match the value c, the elemental unit 20 outputs the value 0 as both the target data and the destination data bound for the elemental unit #m included in the next stage, and if m+2q<N, further outputs both Data(p,m) as the target data and Des(p,m) as the destination data bound for the elemental unit #(m+2q) included in the next stage.
As an operational example, if the input circuit 23 inputs Enable(p,m)=0, all of the AND circuits 33-38 output “0” to the output circuits 25-30. Therefore, the elemental unit 20 and its subsequent elemental units, which input 0 (the output of the AND circuit 35 or 38) as the enabler signal, are disabled.
In contrast, if the input circuit 23 inputs Enable(p,m)=1, and if the bit #q of Dest(p,m) matches the bit #c, the output of the exclusive OR 24 is 0, and thus the output of the AND circuit 31 is 0 while the output of the AND circuit 32 is 1. Therefore, in such a case, all of the output circuits 25-27 output 0 while the output circuits 28-30 output Data(p,m), Dest(p,m), and Enable(p,m), respectively. If the input circuit 23 inputs Enable(p,m)=1, and if the bit #q of Dest(p,m) does not match the bit #c, the output of the exclusive OR 24 is 1, and thus the output of the AND circuit 31 is 1 while the output of the AND circuit 32 is 0. Therefore, in such a case, the output circuits 25-27 output Data(p,m), Dest(p,m), and Enable(p,m), respectively, while all of the output circuits 28-30 output 0.
As already described, the data shifter 10 according to the present embodiment includes a plurality of stages, each of which includes N elemental units 20 in a matrix pattern to perform data shift operations on N-lane data sequences. The data shifter 10 inputs both the N-lane data sequences to be processed as the target data and the destination data of each said data sequence into the N elemental units included in the first stage. Then, the data shifter 10 outputs, as shifted output data of the lane #m, a logical OR of the target data which the elemental units included in the last stage output bound for the elemental unit #m included in the next stage.
As will be plain to those skilled in the art, the assignment of the values c and q determines the operations of the elemental units 20 and the data shifter 10, which includes the plurality of the elemental units.
The data spreading shifter performs the shift of 2┌log
By introducing the elemental unit, we can construct a data spreading/stuffing shifter including a control circuit whose size is O(N log N), which is equal to that of GB 2 370 384 A. More specifically, the gate count of the data shifter according to the present embodiment is O(N log N), and the number of wires is O(N log N). Further, the data shifter according to the present invention requires only O(1) processing step. Thus, the data shifter according to the present embodiment is exceedingly efficient compared to GB 2 370 384 A. In addition, the parameters c and q are preliminarily assigned to each elemental unit 20 and it is unnecessary to control the operations of the elemental units according to the change in operational states of the data shifter 10. This allows easy control of the data shifter 10 and implementation of the shifter 10 with little effort.
The data spreading/stuffing shifter described above can be applied not only to just insertion or removal of data lane elements but also to various data processing applications. For example, the data spreading shifter according to the present embodiment allows easy implementation of a multiplexer for multiplexing multiple data sequences.
Another application of the data shifter according to the present embodiment is a data sifter for “sifting” each data element Data(m) included in an input data sequence into two groups based on a sort key K(m) corresponding to the data element and a predetermined decision function f(K(m)) which takes the sort key K(m) as the input and outputs a Boolean result.
In the example described above, the data stuffing shifter sifts a set of data elements into two groups based on a decision function f(K(m)) which outputs Boolean result by comparing the sort key K(m) with a threshold value 0, but an arbitrary operation can be performed in the decision function. In addition, in the example described above, the data stuffing shifter sifts the data elements in the input data sequence based on the value of said data elements themselves, but the data sifting may be based on any sort key corresponding to the data elements. For example, if the input data sequence is a sequence of memory addresses, the data stuffing shifter may sift the data elements (memory addresses) based on the values of the data elements to which the memory addresses point.
Therefore, the data sifter may sift each data Data(m) element included in an input data sequence into two groups based on sort key K(m) corresponding to said data element and a predetermined decision function f(K(m)) which takes the sort key K(m) as the input and outputs Boolean result. With use of the data stuffing shifter according to the present embodiment, the data sifter may collect data elements where corresponding sort key values let the decision function output “True”, from the data elements included in the input data sequence in order to output a first data sequence. Further, the data sifter may collect data elements where corresponding sort key values let the decision function output “False”, from the data elements included in the input data sequence, with use of the data stuffing shifter according to the present embodiment, to output a second data sequence. As in the previous example, the sort key corresponding to a given data element may be the value of said data itself.
The destination lane number for above stuffing shifter is calculated by counting the data already stuffed for each collection. That is, when we define the result of the decision for lane #m as d(m) and d(m)=0 for positive value, and d(m)=1 for negative value, the destination Des(m) is determined as:
It should be noted that the data stuffing shifter for sifting the positive data elements and the stuffing shifter for sifting the negative data elements may be identical or may be provided separately. The computation of the logical OR may be implemented by at least one logical OR circuit(s). The circuit size of the data sifter based on the data stuffing shifter according to the present embodiment is O(N log N) and thus is very small.
One may construct a data sorter that sorts each data element included in an input data sequence by repeatedly sifting each output of the above described data sifter.
In this way, the data sorter according to the present embodiment sorts each data element included in an input data sequence. The data sorter first inputs each data element included in the input data sequence into the data sifter described above, and then performs control to repeatedly input each data element included in the two independent data sequences into the data sifter such that all of the data included in the input data sequence are sorted.
Thus, the full crossbar switch, which is an example of a data sorter, includes a plurality of data sifters. The plurality of data sifters includes one data sifter that inputs the input data sequence as a target data sequence. Each of the plurality of data sifters inputs a target data sequence, sifts the target data sequence into a first and a second data sequence based on the sort key preliminarily assigned to said data sifter, outputs the first and/or second data sequence, including more than one data elements, to another data sifter(s) as the target data sequence, and outputs the first and/or second data sequence, including only one data element, as the sorting result.
One shifter is constructed with circuit size O(N log N) and the full crossbar switch and data sorter can be constructed with O(N log2 N).
In S84, the elemental unit 20 inputs target data to be processed of size greater than or equal to one bit. At the same time, the elemental unit 20 inputs destination data representing a lane number of a lane where Data (p,m), a logical OR of the input target data, should be routed to, the size of the destination data being ┌log2 N┐ bit(s) (S85). Then, the elemental unit 20 compares the bit #q from the least significant bit of Des(p,m), a logical OR of the input destination data, with the bit value c (S86). Based on the comparison result, the elemental unit 20 outputs both (i) one of Data(p,m) and the value 0 as the target data and (ii) one of Des(p,m) and the value 0 as the destination data bound for the elemental unit #m included in the next stage. If m+2q<N, the elemental unit 20 further outputs both the other of Data(p,m) and the value 0 as the target data and the other of Des(p,m) and the value 0 as the destination data, bound for the elemental unit #(m+2q) included in the next stage (S87). After executing the processing of S84-S87 for all elemental units in all stages, the data shifter 10 outputs, as shifted output data of the lane #m, a logical OR of the target data which the elemental units included in the last stage output bound for the elemental unit #m included in the next stage (S88).
With the processing described above, it is possible to construct a data spreading/stuffing shifter including a control circuit with a circuit size of O(N log N).
As described above, embodiments of the present invention have been described in detail. However, aside from an information processing apparatus, it is possible for the embodiments to involve a method in which a computer executes the above processing or as a program on a storage medium in which the program is stored.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2010/056269 | 3/31/2010 | WO | 00 | 9/21/2012 |