The present invention relates generally to wideband MIMO-OFDM (Multiple-Input Multiple-Output Orthogonal Frequency Division Multiplexing) systems, and, more particularly, to a method: interpolation based QR decomposition in MIMO-OFDM systems using D-SMC (Deterministic-Sequential Monte Carlo) demodulator with per chunk ordering.
The following works by others are mentioned in the application and referred to by their associated reference:
The deterministic sequential Monte-Carlo (D-SMC) demodulator is one of the most promising demodulators for multiple antenna systems over narrowband fading channels [1]. The extension of the MIMO D-SMC demodulator from the narrowband case to the wideband system based on OFDM requires the computation of QR decomposition for each of the data-tones. The number of data tones can range from 48 (as in IEEE 802.11a/g standards) to 6817 (as in the DVB-T Standard). Interpolation based QR decomposition algorithms were recently proposed in [2] for MIMO-OFDM systems which employ an identical channel independent order of demodulation for all tones and it was shown that significant complexity reduction over the previous brute-force method could be achieved particularly for large number of tones and small channel orders. [3] modified the interpolation based QR decomposition techniques developed in [2], for a MIMO-OFDM system where each transmitter uses an independent SISO encoder and SIC decoding is employed at the receiver. To improve the performance obtained with the SIC decoder [3] suggested a common ordering where one “common” albeit channel dependent permutation is computed for all data tones prior to interpolation. The common ordering rule suggested in [3] was an extension of the sorted QR rule suggested earlier for the narrow-band MIMO channel. Extensions of the SINR maximizing greedy rule derived originally for the narrowband channel in [4] have also been proposed.
Various prior art techniques for QR decomposition are illustrated. The technique of
As noted above, the common ordering rule can result in good performance gains and is the best that can be done with the SIC decoder. The post-decoding feedback stage in the SIC decoder does not allow for per-tone ordering rules. On the other hand the D-SMC demodulator based receiver has no such restriction and in fact benefits more from (finer) per-tone based ordering. However, interpolation based QR decomposition algorithms do not provide any complexity reductions (in-fact can increase the complexity!) when per-tone ordering is employed. Thus there is a tradeoff involved since finer ordering (per-tone as opposed to common) results in better performance but at higher processing complexity (separate QR decomposition for each tone as opposed to interpolation based method).
Accordingly, there is a need for a method which resolves the tradeoff of the above known techniques, using per-chunk ordering and corresponding interpolation based QR decomposition (I-QRD) processes.
In accordance with the invention, a method includes determining one of a number of tones per chunk and number of chunks responsive to a sub-band bandwidth and a coherence bandwidth; determining an order for each chunk; and determining, for each chunk, QR decompositions for all tones according responsive to the determined order.
In another aspect of the invention, a method includes determining a number of tones per chunk required to find a pre-chunk order responsive to at least one of sub-band bandwidth and coherence bandwidth and number of chunks; determining an order for each chunk using representative tones in that chunk; computing, for each chunk, QR decompositions for all basis tones responsive to the determined order; and interpolating and determining QR decompositions for remaining tones for each chunk using QR decompositions of said the tones. In a preferred embodiment, for each chunk using the QR decompositions of the basis tones, the QRDs for the remaining tones are interpolated and determined.
In a yet further aspect of the invention, a method includes using a set of basis tones to interpolate and determine channel responses of remaining tones; determining the ideal number of responsive to at least one of sub-band bandwidth and coherence bandwidth; determining an order for each said chunk using any one representative tone in respective that chunk; and computing, for each chunk, QR decompositions for all tones responsive to the determined order.
These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.
The inventive technique determines the optimal number of chunks in each sub-band based on the channel coherence bandwidth, the bandwidth of the allocated resource blocks (or sub-bands) and the given complexity constraints. Ordering rules are provided which determine an optimal order for each chunk and capture most of the gain provided by per-tone ordering while allowing employment of interpolation based QR decomposition. A process to determine QR decompositions via interpolation in a system employing per-chunk ordering is also provided.
The inventive aspect of per-chunk ordering (which includes determining the optimal number of chunks followed by an optimal order for each chunk) is novel and cannot be inferred or derived from either the common ordering of [2] or the fixed ordering of [1] or other prior art. Once the optimal number of chunks along with the order for each chunk is decided, it is possible to extend the interpolation based algorithms of [1] in a straight-forward manner. The invention also provides an efficient interpolation based QR decomposition process which has a lower average complexity than that of the straight-forward extension of the best algorithm of [1] and the prior art.
The frequency selective MIMO channel is converted into a set of N parallel narrowband channels via OFDM. Let Mr and Mt denote the number of receive and transmit antennas. Then the channel model at the jth tone can be written as
yj=Hjxj+vj
where Hj is the MrxMt channel response matrix for the jth tone. In order to use the basic D-SMC demodulator we need to determine the QR decomposition Hj=QjRj for each tone. Also to use the D-SMC demodulator with MMSE pre-processing we need to determine QjI and Rj, where [Hj;I]=QjRj is the QR decomposition of the augmented channel matrix and QjI is the matrix formed by the first Mr rows of Qj. For brevity, this part of the application only discusses the D-SMC without MMSE pre-processing. All the following steps apply directly to the case with MMSE-preprocessing after simply replacing Hj with the augmented matrix [Hj;I].
The performance of D-SMC can be improved if per-tone ordering is employed. Here for the jth tone the QR decomposition is computed for HjPj, where Pj is a permutation matrix that is optimized separately for the jth tone. In a system employing sub-band scheduling, resources are allocated to a scheduled user in the form of multiple sub-bands, each being a set of adjacent tones. In each sub-band a few tones are designated as pilot tones and known pilot symbols are transmitted over these tones for channel estimation. We now describe our per-chunk ordering rule. First, for each sub-band we determine the ideal number of chunks, denoted by Lideal, as equal to the ratio of the sub-band bandwidth and the coherence bandwidth of the underlying channel which in turn can be determined from its estimated delay spread. Then using the procedure described below, the optimal number of chunks per sub-band (denoted by L) is determined and an optimal permutation is chosen per chunk. To illustrate, in
To choose L we note that the computational complexity (of ordering as well as interpolation) increases with the number of chunks. The complexity incurred over each sub-band for any choice of the number of chunks can be determined for instance from the analytical expressions we have derived. Then for given complexity-constraints the optimal number of chunks per-sub-band is defined to the minimum of the ideal number of chunks and the largest number of chunks satisfying the given constraints.
Next, to determine the permutation for each chunk we propose column-norm based ordering, where the order selected is the non-increasing order of the powers received from the transmitters over the either representative tone(s) (example: the center tone) or over all the pilot tones in the chunk. In other words, the transmitter which is deemed to correspond to the highest received power is the first (or root) node in the decision tree of the D-SMC demodulator, the one with the second highest power is the second node and so on. In case the centre tone is employed for ordering and it is not a pilot tone, its channel response can determined via interpolation from the estimates available from the pilot tones. The total number of tones (either representative or pilots) that are used to determine all the orders over the sub-band is fixed at Lideal, irrespective of the chosen number of chunks L.
Next, the following steps summarize the basic version of the inventive interpolation based QR decomposition algorithm for per-chunk ordering. The channel matrices of the pilot tones are estimated and the optimal number of chunks is determined. Then for each chunk: i) interpolate and determine channel responses of the representative tones if the available pilot tones are insufficient, ii) determine the optimal order (or permutation), iii) obtain the QR decompositions of all representative and pilot tones corresponding the order determined, iv) using the QR decompositions of step iii), interpolate and determine the QR decompositions of all data tones in the chunk.
The basic version captures the essence of the idea which is to compute the QR decompositions of all the data tones in a chunk via interpolation instead of the brute-force direct decomposition which involves determining the channel matrix of each tone first and then its QR decomposition. We have improved this basic version considerably by avoiding redundant computations while determining the L*N sets of QR decompositions of the (representative and) pilot channels (one for each chunk in a system with L chunks/sub-band and N sub-bands) and by exploiting interpolation even in computing the QR decompositions of the (representative and) pilot channels for each chunk.
The invention allows obtaining considerable complexity reductions over the existing brute-force methods with negligible performance degradation. With the inventive aspect of per-chunk ordering, the number of chunks is a design parameter. Also provided is a way to determine the ideal number of chunks such that increasing the number of chunks beyond it provides no performance improvements. A method to determine the optimal number of chunks for given complexity constraints is also provided. Methods to determine an optimal permutation for each chunk as well efficient interpolation-based QR decomposition algorithms are also disclosed.
In the above table are provided the worst-case complexity of the inventive process as a fraction of the average complexity of the corresponding brute-force method (of identical performance) for several MIMO configurations. The ideal number of chunks in the allocated sub-band as well as the number of paths were taken to be 6. In the table, the first and the second rows correspond to 1 and 6 chunks, respectively. The column label (4×4; 500) means a MIMO system with 4 receive and 4 transmit antennas and 500 data tones and so on.
Turning now to
Shown in
Detailed Analysis
Turning now to consider some known results and define some notations that will be subsequently used. Let H=QR being the QR decomposition of a Mr×Mt matrix H of rank Mt. Define
Set {tilde over (Q)}=QΔ and {tilde over (R)}=ΔR and let {tilde over (q)}i,({tilde over (r)}i)T denote the ith column and ith row of {tilde over (Q)} and R, respectively. Then if {tilde over (H)}(L,0), i.e., H is a Laurent polynomial matrix of degree L, it has been shown in [2] that
{tilde over (q)}i˜(iL,(i−1)L)& {tilde over (r)}i˜(iL,iL), 1≦i≦Mt. (2)
In the case when no MMSE processing is used, H represents the channel matrix of any data or pilot tone. In particular on the jth tone we have that
where N represents the total number of tones and {{tilde over (H)}t} are the time-domain multi-path channel responses. It is clear that
is a LP matrix of degree L and Hj=H(exp(iθ))|θ=2πj/N. Since the QR decomposition of the channel matrix Hj of each data tone is required, the result in (2) can be directly used. On the other hand when MMSE pre-processing is employed, for each data tone we need Rj and Qj,i where [HjT,I]T=QjRj represents the QR decomposition of the augmented matrix [HjT,I]T and Qj,I denotes the matrix formed by the first Mr rows of Qj. Consider the matrix
It can be seen that since {tilde over (H)}(exp(iθ)) is also an LP matrix of degree L, the result in (2) can be directly used.
Per-Chunk Ordering
Let D be the set of allocated data channels. The allocated data tones can consist of adjacent tones as in localized allocation or it can consist of widely spaced tones as in distributed allocation. Define the sub-band S to be a contiguous set of tones such that the bandwidth of S, denoted by BWS, is equal to the bandwidth of D, denoted by BWD. Let P be the set of representative or pilot tones. The channel matrices corresponding to the tones in P are either interpolated or estimated and for our purposes in the latter case we assume perfect estimation. The objective is to obtain the QR decomposition of the channel matrix (or the augmented channel matrix) of each tone in D. One way to do this, referred to here as the brute-force method, is to interpolate and determine each Hj, j ε D using the channel matrices from P (recall that this can be done since (3) and (4) are LP matrices) and then do its QR decomposition. This method generally results in the highest complexity but an advantage is that we can do per-tone ordering. In particular after computing Hj we can select any permutation matrix Pj and then compute HjPj=QjRj. The methods suggested in [2,3] involve computing the QR decompositions of only the channel matrices of the pilot tones and obtaining the Q and R matrices for each of the data tones using interpolation. Although substantial computational savings can be accrued through these methods, a drawback is that we can at-best employ one common ordering or permutation i.e. we need to fix a common permutation P (which can be channel dependent) across all the tones before interpolation.
We now propose two column-norm based common ordering rules. In the first method using the available channels in P (assuming absolute value of P≧L+1) obtain the time-domain multi-path channels {{tilde over (H)}l}l=oL via interpolation. Then with {tilde over (H)}l=[{tilde over (h)}l,1, . . . , {tilde over (h)}l,M
In other words in each tone, transmitter which is deemed to correspond to the highest received power (over all tones) is the first (or root) node in the decision tree of the D-SMC demodulator, the one with the second highest power is the second node and so on. The motivation for defining this rule comes from the observation that the total power received from transmitter j is equal to
Note that the ordering determined in this case is independent of the sub-band S.
The other column-norm based ordering is the non-increasing order of {ΣlPε∩∥hl,j∥2}j=lM
Next, we introduce an inventive aspect per-chunk ordering rule. As mentioned earlier, since the D-SMC demodulator allows us to perform ordering on a per-tone basis, the performance of any proposed ordering rule should be compared to the optimal per-tone ordering performance. In OFDM systems, we see that the channel matrices in any sufficiently small set of set of consecutive tones are highly correlated and hence intuitively one would expect that one common ordering or permutation would be near-optimal for all the tones in that set. This simple observation forms the basis of our per-chunk ordering rule where we propose to divide the allocated sub-band S into Q non-overlapping chunks, C1, . . . , CQ, with each chunk being a smaller contiguous set of tones and Q being the specified input to the algorithm. Over the jth chunk a common (albeit channel dependent) permutation Pj is used for all tones. To complete the description of our algorithm we need to describe a way to obtain the permutation Pj,1≦j≦Q for each chunk. To do so, we let BWcj denote the band-width of the jth chunk and let BWcoh denote the coherence bandwidth2 of the channel. Then let Rj ⊂ Cj ∩ P denote any set of sufficiently dispersed tones in j such that
A good permutation Pj can be determined as the non-increasing order of
Note that we have assumed that such a set Rj exists. Otherwise we can interpolate the channel responses of the required number of tones.
Next, we comment on the ideal number of chunks. The ideal number of tones is defined as
where BWS denotes the band-width of the allocated sub-band. Note that over an L+1 path channel the ideal number of chunks is no greater than L+1. For the baseline system where each channel matrix is estimated prior to its QR decomposition, we recommend per-chunk ordering with the ideal number of chunks.
Pre-Chunk Ordering and I-QRD
Another aspect of the invention uses per-chunk ordering and leverages inter-polation based QR decomposition (I-QRD) methods for each chunk to compute the QR decompositions of the channel matrices corresponding to the data tones in it.
To describe the problem we introduce some notation. Let B1 ⊂ B2 . . . ⊂ BM
q1,i=q2,i,{tilde over (q)}1,i={tilde over (q)}2,i
r1,i≐r2,i,{tilde over (r)}2,i≐{tilde over (r)}2,i,
r1,i(i)=r2,i(i), {tilde over (r)}1,i(i)={tilde over (r)}1,i(i)∀i (6)
We are now ready to provide our algorithms. The first algorithm requires more computations but allows for a more parallel structure.
A more efficient version of Algorithm 2 is also possible using the following idea. For each iε {1, . . . , Mt} we can partition the set {1 . . . , Q} into at-most Q non-empty sets {Si,k}k=1m
p,q ε Si,kPp(:,1:i−1)≐Pq(:,1:i−1)& Pp(:,i)=Pq(:,i) (15)
and
Pp(:,1:i−1)≐Pq(:,1:i−1)& Pq(:,i)=Pq(:,i) ∃k: p,q ε Si,k (16)
Once these sets are determined, at the ith step we need to compute the ith column and row of Q and R, respectively, only for one index in each of the mi sets. Also, in the case of common ordering we have Q=1 i.e. only one chunk is present and our Algorithms 1 and 2 reduce to algorithms 2 and 3 of [2], respectively.
Our interpolation algorithms work for any given set of chunks and associated per-chunk permutations. To decide on the optimal number of chunks, we have to note the following points. The complexity of the algorithm increases with the number of chunks and can be worse than the corresponding brute-force method if the number of grows beyond a certain point as will be analytically shown in the next section. On the other hand, as seen in the previous section the performance of the D-SMC demodulator improves as the number of chunks increases but the gains become negligible after the number of chunks exceeds the ideal number of chunks. Thus for given complexity-constraints the optimal number of chunks per-sub-band is defined to the minimum of the ideal number of chunks and the largest number of chunks satisfying the given complexity constraints.
Complexity Analysis
In this section we conduct a complexity analysis to demonstrate the computational savings resulting from our algorithms. We consider the case without MMSE preprocessing (referred to as the ZF case) and in this case let N=Mr. In the case when MMSE preprocessing is used, recall that the augmented channel matrix per-tone has dimensions (Mr+Mt)×Mt and we set N=Mr+Mt. Following [2], we let cIP denote the cost (in terms of full multiplications) of interpolating a scalar LP. Let cQR denote the cost of QR decomposition and cM,cM−
cQR=3Mt2N/2+3N2Mt/2−Mt3−Mt2/2−N2/2−(N+Mt)/2,
cM=Mr(Mt−1)+Mt(Mt+1)/2+Mt−1, cM
In the following analysis we assume an L+1 path channel and let Np=2LMt+1 denote the number of representative tones which are contained in the set of data tones so that QR decompositions must be determined for these also. These representative tones are used to determine the ordering and also for interpolation in the I-QRD algorithms. The channel matrices of these representative tones are determined through interpolation using a set of estimated pilot channels. Let Q denote the ideal number of chunks and note that when we have one chunk, Q is also the number of representative tones needed to determine the common ordering. Then, we have that
cBF−fixed=D(MtMrcIP+cQR),
cBF−per−tone=D(MtMrcIP+cQR+MtMr),
cBF−common=D(MtMrcIP+cQR)+QMtMr,
cBF−Q chunk=D(MtMrcIP+cQR)+QMtMr,
cAlg1−common=(D−Np)(cM
cAlg1−Qchunk=cAlg1−common+(Q−1)Np(cQR+cM),
where cBF−fixed, cBF−per−tone and cBF−common denote the complexities of the baseline brute-force method with a fixed (channel-independent) order, the baseline brute-force method with per-tone ordering and baseline brute-force method with common ordering (determined from Q representative tones), respectively. cBF−Qchunk, cAlg1−common and cAlg1−Qchunk denote the complexities of the baseline brute-force method using per-chunk ordering with (ideal number) Q chunks, the first I-QRD algorithm with common ordering (determined from Q representative tones) and the first I-QRD algorithm with Q chunks, respectively.
For the second algorithm, since the complexity is channel dependent, we provide the worst-case complexity where again for complexity computations we count the number of multiplications. Then, we first obtain
Then, we have that
cAlg2−common=(D−Np)(cM
cAlg2−Qchunk=cAlg2−common+(Q−1)E,
where cAlg2−common and cAlg2−Qchunk denote the complexities the second I-QRD algorithm with common ordering (determined from Q representative tones) and the second I-QRD algorithm with Q chunks, respectively.
In the table below are compared the computational complexities of the inventive method for a 4×4 MIMO system using the OFDM access (512 point DFT) over a 6 path fading channel for different number of data tones. Following [2], we set cIP=2. We consider the case with MMSE processing as well as the case without it. In the first row we plot the ratio
and in the second row we plot the ratio
and in both cases we set Q=6.
The present invention has been shown and described in what are considered to be the most practical and preferred embodiments. It is anticipated, however, that departures may be made there from and that obvious modifications will be implemented by those skilled in the art. It will be appreciated that those skilled in the art will be able to devise numerous arrangements and variations which, although not explicitly shown or described herein, embody the principles of the invention and are within their spirit and scope.
This application claims the benefit of U.S. Provisional Application No. 60/825,936, entitled “Interpolation Based QR Decomposition for MIMO-OFDM Systems Using D-SMC Demodulator with Per Chunk Ordering”, filed on Sep. 18, 2006, the contents of which is incorporated by reference herein.
Number | Name | Date | Kind |
---|---|---|---|
7110349 | Branlund et al. | Sep 2006 | B2 |
7742536 | Burg et al. | Jun 2010 | B2 |
20070253476 | Tirkkonen et al. | Nov 2007 | A1 |
20080025336 | Cho et al. | Jan 2008 | A1 |
20080069261 | Prasad et al. | Mar 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20080069261 A1 | Mar 2008 | US |
Number | Date | Country | |
---|---|---|---|
60825936 | Sep 2006 | US |