1. Field of the Invention
The present invention relates generally to communications and more particularly relates to mixed radix fast Hadamard transforms (FHTs) employed in communication systems.
2. Description of the Related Art
Hadamard transform (HT) and associated fast Hadamard transform (FHT) are used extensively in wireless communications and other communication systems to speed up signal processing in, for example, physical random access channel (PRACH) detection and channel quality indication (CQI) maximum likelihood decoding of today's code division, multiple access 3G and 4G wireless communication systems. Typically, a receiver demodulates and despreads a received signal, and then applies an HT to provide the demodulated data symbols.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In one embodiment, the present invention allows for applying an order N fast Hadamard transform (FHT) of a vector U using a mixed radix FHT in a receiver of a communication system, the N a positive integer, when receiving signals from a transmitter over a channel and generating the vector U. The method includes, in an FHT module of a decoder in the receiver, planning n stages of the mixed radix FHT, where the n is a positive integer, each stage defined by corresponding logic, decomposing the order N FHT into n low order FHTs, such that N=Ki,Kn−1, . . . K1 and U=UK
The aspects, features, and advantages of the present invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which like reference numerals identify similar or identical elements.
FIG. B is a block diagram illustrating a 4-point FHT structure constructed using, the 2-point FHT shown in
Hereinafter, embodiments of the present invention are described with reference to the drawings. The present invention relates to a communication system including a method and apparatus for performing a mixed radix Fast Hadamard transform (FHT), which reduces the complexity of a high order FHT. The method provided, in accordance with the present invention, may decompose a high order FHT into low order FHTs, calculate each low order FHT with a Digital Signal Processor (DSP) intrinsic instruction set, and then recombine the calculated results of each low order FHT to form the final output result. More specifically, the method described below may perform an FHT of 2N order at n stages of mixed radix FHTs. In each mixed radix FHT calculation, the input 2N order FHT is decomposed into n stages of smaller 2Ki order (Ki<N, i=1 to n) FHTs, and the input vectors for the subsequent stage are calculated in the proceeding stage. The present invention provides the method that can reconstruct the output result by n stages of smaller order FHTs. Further, the order Ki of each stage may not necessarily be the same. When coupled with advanced digital signal processing intrinsic support of FHT, the mixed radix FHT method disclosed in the present invention provides a significant speedup over the conventional FHT method. In addition, the mixed radix FHT method of the present invention may be applied to decoders in communication systems, especially in CQI decoding in wideband code division multiple accesses (WCDMA) and high speed packet access (HSPA) wireless receivers. The present invention is not so limited, and any existing and emerging computer systems may apply the mixed radix FHT method to perform desirable calculations.
Operation of the FHT module of
A Hadamard Transform transforms a 1×N vector UN by an N×N Hadamard matrix HN, where N is a positive integer greater than 1 and a power of two. Here the Hadamard transform is called an order N Hadamard transform. The transformation result is a new 1×N vector as defined in relation (1):
QN=UNHN. (1)
A Hadamard matrix (HM), might be constructed recursively as in relation (2):
where denotes Kronecker product, N and K are positive integers and H2 is the fundamental Hadamard matrix defined in relation (3):
Thus, the straightforward way of Hadamard matrix by multiplying vector with Hadamard matrix requires N2 multiplication and addition operations for a 1×N vector, which means Hadamard transform by matrix multiplication has a complexity of O(N2).
In order to speed up computation, the faster and most widely used method for Hadamard transform is the fast Hadamard transform (FHT). For an order N Hadamard transform, most of the FHT algorithms require N log3N addition/subtraction operations, with complexity O(N log2N).
As shown, inputs 0, 1 and 2, 3 are provided to H2 transform modules 203, 204, respectively, and the outputs of H2 transform modules 203, 204 are crossly applied to lower and upper butterfly configurations 205, 206 formed by H2 transform modules. A pair of outputs [0′,1′], [2′,3′] from upper and lower butterfly configurations 205, 206 become the output values, 0′, 1′, 2′, and 3′. The complexity of H2 is 2×log 2=2 and the complexity of H4 is 4×log 4=8 for the conventional FHT.
In many applications where new processors are employed, the new processor instruction set allows for fitter reduction in the number of additions employed to implement the FHT by use of tailored FHT instructions. In an advanced digital signal processor (DSP) design, these new instructions are introduced to calculate higher order of FHT such as, for example, N=16, by one instruction/operation. Although the conventional FHT approach exists, these new instructions may allow a transform module to perform the desired FHT even faster if the complexity of a high order FHT is reduced.
First, let HM represent the Sylvester Hadamard matrix (2) of order M. It has been proved that the Kronecker product of two Hadamard matrixes of order K and M is also a Hadamard matrix of an order K×M, that is HK×M=HKHM. Thus, the Hadamard matrix HM×N×K can be constructed by three smaller Hadamard matrixes, HM, HN and HK, as given in relation (5):
H
M×N×K
=H
M
H
N
H
K
=H
K
H
M×N (5)
where is defined as a Kronecker product.
The example of Kronecker product of two Hadamard matrixes HK and HM is given by relation (6):
where the resulting new Hadamard matrix HK×M has new elements HK(i, j)HM, where HK(i, j) is the i-th row and j-th column element of HK.
The M order Hadamard transform may be represented by HTM. The decomposition of Hadamard transform described below may be represented through an example of the calculation of HTM×N×K by HTM, HTN and HTK.
By definition, the Hadamard transform of a vector UM×N×K={μ1, μ2, . . . , μM×N×K} with M×N×K elements is the multiplication of this vector with Hadamard matrix HM×N×K as given in relation (7):
where μM×Ni={μ(M×N)×(i−1), . . . , μ(M×N)×(i−1)+M×N}, i=1 to K, is the i-th subset of the input vector UM×N×K each with M×N elements, and relation (8).
is an output vector with M×N elements obtained by the mixed radix FHT.
A vector Vi is defined as in relation (9):
V
i=μM×NiHM×N={ν1i, ν2i, . . . , νM×Ni}, for i=1 to K (9)
where Vi is also an M×N dimension vector. Substituting relation (9) into relation (8), then, the r-th item in QM×Nj is obtained as in relation (10):
where r=1 to M×N.
Assembling together all r-th items qrj in the QM×Nj, for j=1 to K, provides relation (11):
and, letting Q be as defined in relation (12),
the output result of UM×N×KHM×N×K may be obtained by reading the columns of Q.
The matrix Q may be represented by the Vi and QM×Nj by combining relations (8) and (9) to provide relation (13),
where Xsi={(V1)T(V2)T . . . (VK)T}, T=1 to M×N, and Xsi is the i-th stage input matrix at stage s. The definition of a stage is described below in detail.
The stage is defined as a process of calculation that depends on the same, original order of Hadamard transform. For example, HTK in the above relation (13) is the only Hadamard transform performed, thus, the calculation with HTK forms stage 3 for this example. The input vector for this stage, Vi, is calculated in another state, which is a key feature of embodiments of the present invention for decomposition of a large order Hadamard transform into many, smaller-order Hadamard transforms. In the above example, three stages are generally required, each depending on HTM, HTN and HTK respectively. That is, the calculation with HTN forms stage 2, and the calculation with HTM forms stage 1. As described above, relation (13) shows stage 3 (i=1 and s=3) for this example, and Xsi is an M×N rows and K columns matrix since HTK is the last stage.
In order to calculate matrix Q at stage 3, all vectors Vi may be needed for i=1 to K. Based on the previous definition, it is already known that
V
i=μM×NiHM×N={ν1i, ν2i, . . . , νM×Ni}, for i=1 to K
which is the Hadamard transform HTM×N the vector μM×Ni performed at stage 2.
Similarly, the same techniques as described above may be applied to calculate Vi of stage 3 at stage 2. If dividing into μM×Ni into N equal size vectors with M elements in each vector, i.e., μM×Ni={μMi1,μMi2, . . . , μMiN}, Vi may be calculated as in relation (14):
Let {tilde over (V)}p=μMipHM, p=1 to N, which is an M dimension vector. Following previous procedure, Vi may be obtained from relation (15):
{({tilde over (V)}1)T({tilde over (V)}2)T . . . ({tilde over (V)}N)T}HN=X2iHN (15)
by reading its column in sequence and concatenating them into an M×N dimension vector. The calculation of all Vi finishes at stage 2, and at stage 2, only HTN is performed.
The next stage, stage 1, calculates all {tilde over (V)}p, that might be a straightforward Hadamard transform of the vector μMip for i=1 to K and p=1 to N. The μMik is obtained from UM×N×K by dividing UM×N×K into N×K equal sized (size=M) smaller vectors, and, stage 1, only HTM is performed.
A procedure of decomposing Hadamard transform is described above; however, the present invention is not so limited, and other procedures maybe employed. For example, in practice, the last stage, stage 1, may be performed first (i.e., calculate the {tilde over (V)}p first). In addition, a component of the decomposition of a large order Hadamard transform is that each subsequent stage uses the output results calculated from the proceeding stage as its input vectors (i.e., each stage finishes its Hadamard transform based on the results obtained from the previous stage).
The mixed radix FHT in accordance with embodiments of the present invention may be performed by any existing and emerging processors, such as a data processor, a vector processor, dedicated application specific IC (ASIC) or similar devices. Considering a FHT of a vector U of a given size N, and given an intrinsic support of FHT of size F and smaller, a general method of decomposition and implementation of the mixed radix FHT according to the present invention is described below in detail with the reference to
The first step sets a number n of stages employed to decompose N=KnKn−1 . . . K1.
Two situations might exist for n>1. If n>1, and Fn=N, then K1=K2= . . . , =Kn=F, and the stage planning 300 advances to a same radix FHT at each stage at step 306. Otherwise, if n>1, and Fn>N, the stage planning 300 advances to step 308. At step 308, an m stage is selected, where m=2, . . . , n, and an intrinsic support of FHT of size Φ smaller than F is given. Then, stage planning 300 makes K1=K2, . . . , =Km=Φ, and Km+1=Km+2=, . . . =Kn=Φ/2, such that N=KnKn−1 . . . K1, Φ≦F, and 1/K1+1/K2+ . . . +1/Kn might be the smallest integer value.
The corresponding Hadamard Matrix HK
The embodiment of stage planning 300 shows selection of the number of the stages that are involved in transforming the vector UN by using the mixed radix FHT of the present invention. A general method of the mixed radix FHT in accordance with embodiments of the present invention is described below with respect to
As shown in
V1i={tilde over (V)}1iHK
At step 403, the data processor evaluates if the process completes. If the process completes, then the final output result is provided at step 404. If the process is incomplete, method 400 advances to step 405. At step 405, the next m stages are performed, where m=2, . . . , n. If m=2, stage 2 is performed. At stage 2, the input vectors for stage 2 are constructed by V1i obtained at stage 1; that is, {tilde over (V)}2i=V1i, i=1 to KnKn−1 . . . K2.
In general case, the input vectors of stage m are constructed by the output vectors of stage m−1. In step 406, the mixed radix FHT HK
and, thus, a total KnKn−1 . . . Km+1 number of Xmk are at stage m (Xmk and Ymk both are Km−1Km−2 . . . K1 by Km matrixes).
Further, for stage m+1, the input vectors {tilde over (V)}(m−1)k are constructed based on the results of stage m, as shown in relation (18):
{tilde over (V)}(m+1)k=VEC—COL(Ymk), (18)
where the function VEC_COL(Y) returns a vector by concatenating all columns of matrix Y in sequence. When m=n, only one vector, {tilde over (V)}(m+1)i={tilde over (V)}(n+1)i, for i=1 only, is obtained. This vector {tilde over (V)}(n+1)i is the final output result of the fast Hadamard transformed input vector UK
{tilde over (V)}(n+1)i=UK
Accordingly, the FHT of the input vector UK
One of the advantages of the mixed radix FHT of the present invention is reduction of the complexity of FHT. If a data processor is able to perform the Hadamard Transform, HTK
number of Hadamard Transforms of order Km may be required. For all n stages, a total
number of Hadamard Transform operations may be required. If each operation takes L cycles,
cycles may be required to finish a FHT of a vector with N elements, which may be less than a complexity O(N log2N) of a conventional FHT.
A special case is Kn=Kn−1= . . . =K2=K. In this case, the vector length N is N=Kn. The complexity of the conventional FHT for this case may be O(nKnlog K). If the data processor supports HTK operation, by the above analysis, the complexity of the mixed radix FHT for this case may be O(nKn−), which is smaller than the complexity O(N log2N) of the conventional FHT.
At stage 1, an input vector containing 1024 elements are divided into 128 groups/vectors, with 8 elements in each vector. The outputs of stage 1 are also 128 vectors. Every 8 output vectors from stage 1 are grouped and permutated to form the stage input matrixes Xsi stage 2. After another round of HT8, the outputs in stage 2 of every 8 input vectors are concatenated into a 64 element vector. At stage 3, the output vectors from stage 2 are permutated again to farm the new stage input matrixes Xs i for HT16 of stage 3. The final HT16 outputs are permutated and concatenated to generate the final FHT output result.
Note that stage 1 has output vectors of 8 elements, stage 2 has output vectors of 8×8=62 elements and stage 3 has the final output of single vector with 8×8×16=1024 elements.
Regarding, the complexity of decomposed FHT, it is assumed that a vector processor can do HT16, addition and subtraction operation each in 1 cycle. The conventional FHT algorithm takes 1024×log 1024=10240 cycles for N=1024. However, the mixed radix FHT algorithm of the present invention only takes 1024×(1/16+1/8+1/8)=320 cycles, according to
cycles for the mixed radix FHT method discussed above. As such, the mixed radix FHT method provides a significant speedup over the conventional FHT method.
A mixed radix FHT in accordance with an embodiment of the present invention may be implemented in either hardware, software or a combination of hardware and software. For example, a computer may be programmed to execute software adapted to perform the mixed radix FHT or any portion thereof. A typical use of the mixed radix FHT of the present invention is to decode a block code used in telecommunication systems. A block diagram illustrating an example of 3G WCDMA wireless communication system 600, where the block code is used to encode the CQI in mobile station, is shown in
As shown, CQI 608 provides information of an instant downlink channel quality, which may be used by base station 602 to allocate a wireless resource and schedule services. CQI 608 reported by different mobile stations 604 allows base station 602 to select mobile station 604 with a good channel quality to receive services and thus increase the network service rate. However, due to a high rate of receiving CQI 608 from all mobile stations 604 in base station 602, it may consume a large portion of computation power of base station 602 to decode all CQI 608 encoded with the block code when a maximum likelihood method is used. Therefore, a reduced complexity of FHT, the mixed radix FHT of the present invention, is desirable to decode CQI 608.
A vector processor might be employed that has instructions to calculate 16-point FHT, H16, such as, for example, a LSI (LSI Corporation) vector processor. When decoding the block, code (20, 10) from mobile station 604, a 1024-point FHT may be required. The mixed radix FHT of the present invention provides an approach to calculate the 1024-point FHT by using the 16-point, 8-point and 4-point FHTs. The 16-point FHT instruction may perform 8-point or 4-point FHT by setting unused input to 0.
In the CQI (20, 10) code, the 10 CQI information bits are encoded into 20 bits. Base station 602 first receives the encoded 20 bits and then tries to decode the 10 information bits after receives the 20 encoded bits at step 702. The 20 encoded bits of the CQI (20, 10) code may be distorted by interference or noise in a wireless channel.
At step 704, base station maps the 20 encoded bits to 1024 symbols. The mapping process is based on the generation matrix that mobile station 604 used to encode the CQI. The detailed encoding process is described in standard 3GPP TS 25.21.2 V9.1.0 (2010-03), section 4.7. The generation matrix determines the positions of the 20 symbols (corresponding to the 20 encoded bits) in the 1024 symbols. The rest symbols in the 1024 symbols are set to 0 if their positions are not mapped to the received 20 symbols.
At step 706, the 1024-point FHT by using the mixed radix FHT is then applied to the 1024 symbols, which may also get 1024 symbols as results.
At step 708, among the 1024 resulting symbols, the symbol with the maximum value is identified, whose index is the final output of the 10-bit CQI information. Hence, the 10 information bits are decoded from the 20 encoded bits.
From the above description of decoding process 700, step 706 of the 1024-point FHT using the mixed radix FHT of the present invention is a computation intensive step in decoding process 700. Thus, the mixed radix FHT of the present invention provides a reduction in the computation cycles required by utilizing 16/8/4 FHTs for 1024 FHT.
In alternative embodiments, the mixed radix FHT method of the present invention may be applicable to implementations of the invention in integrated circuits, field programmable gate arrays (FPGAs), chip sets or application specific integrated circuits (ASICs), DSP circuits, wired or wireless implementations and other communication system products.
The present invention is not limited in the manner of implementation. It is understood that while the embodiment shown herein is a typical computer system implemented with the mixed radix FHT. The present invention may be implemented in different computer systems known in the art. One skilled in the computer arts can construct the mixed radix FHT mechanisms described herein in either hardware, software or a combination of hardware and software.
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments. The same applies to the term “implementation.”
As used in this application, the word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word exemplary is intended to present concepts in a concrete fashion.
Additionally, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or clear from context, “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, if X employs A; X employs B; or X employs both A and B, then “X employs A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form.
Moreover, the terms “system,” “component,” “module,” “interface,”, “model” or the like are generally intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a controller and the controller can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
Although the subject matter described herein may be described in the context of illustrative implementations to process one or more computing application features/operations for a computing application having user-interactive components the subject matter is not limited to these particular embodiments. Rather, the techniques described herein can be applied to any suitable type of user-interactive component execution management methods, systems, platforms, and/or apparatus.
While the exemplary embodiments of the present invention have been described with respect to processes of circuits, including possible implementation as a single integrated circuit, a multi-chip module, a single card, or a multi-card circuit pack, the present invention is not so limited. As would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing blocks in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general purpose computer.
The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of program code embodied in tangible media, such as magnetic recording media, optical recording media, solid state memory, floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes art apparatus for practicing the invention. The present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits. The present invention can also be embodied in the form of a bitstream or other sequence of signal values electrically or optically transmitted through a medium, stored magnetic-field variations in a magnetic recording medium, etc., generated using a method and/or an apparatus of the present invention.
Unless explicitly stated otherwise, each numerical value and range should be interpreted as being approximate as if the word “about” or “approximately preceded the value of the value or range.
The use of figure numbers and/or figure reference labels m the claims is intended to identify one or more possible embodiments of the claimed subject matter in order to facilitate the interpretation of the claims. Such use is not to be construed as necessarily limiting the scope of those claims to the embodiments shown in the corresponding figures.
It should be understood that the steps of the exemplary methods set forth herein are not necessarily required to be performed in the order described, and the order of the steps of such methods should be understood to be merely exemplary. Likewise, additional steps may be included in such methods, and certain steps may be omitted or combined, in methods consistent with various embodiments of the present invention.
Although the elements in the following method claims, if any, are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.
As used herein in reference to an element and a standard, the term “compatible” means that the element communicates with other elements in a manner wholly or partially specified by the standard, and would be recognized by other elements as sufficiently capable of communicating with the other elements in the manner specified by the standard. The compatible element does not need to operate internally in a manner specified by the standard.
No claim element herein is to be construed under the provisions of 35 U.S.C. §112, sixth paragraph, unless the element is expressly recited using the phrase “means for” or “step for.”
It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the scope of the invention as expressed in the following claims.