1. Technical Field
The present invention pertains to the field of wireless communication. More particularly, the present invention pertains to wireless communication using MIMO for communication over a multipath fading communication channel.
2. Discussion of Related Art
MIMO (Multiple-Input-Multiple-Output) spatial multiplexing, i.e. using multiple antennas at both the transmitter and receiver sides of a communication channel, has recently emerged as a significant breakthrough to increase spectral efficiency in communication over wireless communication channels.
On the other hand, to support multimedia services, UMTS (Universal Mobile Telecommunication System) and CDMA2000 (Code Division Multiple Access 2000) extensions optimized for data services have been used as a basis for MC-CDMA (Multi-Code CDMA) systems and the High-Speed Downlink Packet Access (HSDPA) service and its equivalent 1× EV-DV (Evolution Data and Voice)/DO.
MIMO spatial multiplexing according to the prior art works reasonably well for narrow-band and flat-fading communication channels. In a multipath-fading communication channel, however, the orthogonality of the spreading codes used in CDMA-based communication is destroyed and Multiple-Access-Interference (MAI) along with Inter-Symbol-Interference (ISI) is introduced. A conventional rake receiver often does not provide satisfactory performance in case of multi-path fading.
The prior art provides so-called LMMSE (Liner Minimum Mean Squared Error) based algorithms for use in a receiver, and these algorithms have demonstrated fairly good performance in slow-fading channels, but have provided only limited performance in fast-fading channels. Receivers using a Kalman filter (based on a state-space model of the dynamical system) are known to provide substantially better performance for MIMO CDMA downlink in fast-fading environments. However, a Kalman-filter based equalizer component of a receiver has a prohibitively high complexity for real-time hardware implementation in a mobile device. A Kalman filter performs iterations in computing the Kalman gain and providing a next state prediction. The complexity is dominated by numerous multiplications of large matrices and the calculation of an inverse of the innovation correlation matrix in the Kalman gain and next state prediction. For a receiver in a mobile communication device, the hardware for providing the processing power required to provide a real-time Kalman filter functionality is prohibitive.
Thus, what is needed is a receiver suitable for use with MIMO and in case of a multipath fading communication channel, but with relatively low complexity in the sense of processing power needed for real-time operation.
Accordingly, in a first aspect of the invention, an apparatus is provided, comprising: a radio section, responsive to a plurality of signals wirelessly received over a communication channel using a plurality of receive antennas, for providing a received signal; and a Kalman filter, responsive to the received signal, for providing a corresponding processed signal indicating information conveyed by the received signal, responsive to a set of values indicating predicted state error correlation at a first instant of time given all noise estimates up through the first instant, for providing a set of values indicating a product of measurement values and predicted state error correlation at a later instant of time given all process noise estimates up through the later instant.
In accord with the first aspect of the invention, the Kalman filter may include a transition and common data path, and the transition and common data path may be responsive to the set of values indicating predicted state error correlation at a first instant of time given all noise estimates up through the first instant, and may provide the set of values indicating a product of measurement values and predicted state error correlation at a later instant of time given all process noise estimates up through the later instant. Further, the Kalman filter may include a gain calculator for providing a Kalman gain, the gain calculator comprising the transition and common data path and further comprising a Riccati processor and a Kalman gain processor, and the set of values indicating a product of measurement values and predicted state error correlation at a later instant of time given all process noise estimates up through the later instant may be provided by the transition and common data path to both the Riccati processor and the Kalman gain processor. Further still, the Kalman filter may also include a state estimator, responsive to the received signal and to the Kalman gain, for providing a filtered state estimate as the processed signal indicating information conveyed by the received signal. Also further still, the transition and common data path may also provide to the Riccati processor a set of values indicating predicted state error correlation at the later instant of time given a set of values indicating predicted state error correlation at the first instant. Still also further still, the Kalman gain processor may be responsive to a set of values indicating measurement noise at the later instant and to the set of values provided by the transition and common data path, and may provide a set of values indicating gain at the later instant.
Also in accord with the first aspect of the invention, transitions from one state to a next state and corresponding error correlations may be determined based on a state transition matrix partitioned into blocks corresponding to a displacement structure, and a shifting operation may be performed instead of a multiplication in determining values corresponding to mathematical expressions including a term in which the state transition matrix multiplies a matrix or a vector. Further, the transitions from one state to a next may be performed based on a state transition equation including the state transition matrix and based on partitioning the state transition equation into blocks with one block for each transmit antenna, and a next state may be determined using the shifting operation for each block. Also further, the filtered state estimates may be determined based on a state estimate equation with terms depending on the state transition matrix and based on partitioning the state estimate equation into blocks with one block for each transmit antenna so as to provide a state estimate equation for each transmit antenna, and a filtered state estimate may be determined using the shifting operation for each block. Still also further, a filter gain may be determined based on innovation equations with terms depending on the state transition matrix and also depending on a measurement matrix representing the communication channel and based on partitioning the measurement matrix into blocks with each block corresponding to a pair of one receive antenna and one transmit antenna, and in some cases, vector multiplication by the measurement matrix may be implemented so as to correspond to a delayed tap line. Still even also further, a conjugate-gradient algorithm and the displacement structure may be used to reduce a matrix inverse operation to one or more matrix-vector or matrix-matrix multiplications.
In a second aspect of the invention, a wireless terminal is provided, including a receiver section having an apparatus, the apparatus comprising: a radio section, responsive to a plurality of signals wirelessly received over a communication channel using a plurality of receive antennas, for providing a received signal; and a Kalman filter, responsive to the received signal, for providing a corresponding processed signal indicating information conveyed by the received signal, responsive to a set of values indicating predicted state error correlation at a first instant of time given all noise estimates up through the first instant, for providing a set of values indicating a product of measurement values and predicted state error correlation at a later instant of time given all process noise estimates up through the later instant.
In accord with the second aspect of the invention, the wireless terminal is either a user equipment device or an entity of a radio access network of a cellular communication system wirelessly coupled to the user equipment device.
In a third aspect of the invention, a system is provided, comprising a user equipment device and an entity of a radio access network of a cellular communication system wirelessly coupled to the user equipment device, wherein at least either the user equipment device or the entity of the radio access network include an apparatus, comprising: a radio section, responsive to a plurality of signals wirelessly received over a communication channel using a plurality of receive antennas, for providing a received signal; and a Kalman filter, responsive to the received signal, for providing a corresponding processed signal indicating information conveyed by the received signal, responsive to a set of values indicating predicted state error correlation at a first instant of time given all noise estimates up through the first instant, for providing a set of values indicating a product of measurement values and predicted state error correlation at a later instant of time given all process noise estimates up through the later instant.
In a fourth aspect of the invention, a method is provided, comprising: wirelessly receiving a plurality of signals over a communication channel using a plurality of receive antennas, for providing a received signal; and Kalman filtering the received signal so as to provide a corresponding processed signal indicating information conveyed by the received signal, based on processing a set of values indicating predicted state error correlation at a first instant of time given all noise estimates up through the first instant so as to provide a set of values indicating a product of measurement values and predicted state error correlation at a later instant of time given all process noise estimates up through the later instant.
In a fifth aspect of the invention, a computer program product is provided, comprising a computer readable storage structure embodying computer program code thereon for execution by a computer processor, wherein said computer program code comprises instructions for performing a method comprising: wirelessly receiving a plurality of signals over a communication channel using a plurality of receive antennas, for providing a received signal; and Kalman filtering the received signal so as to provide a corresponding processed signal indicating information conveyed by the received signal, based on processing a set of values indicating predicted state error correlation at a first instant of time given all noise estimates up through the first instant so as to provide a set of values indicating a product of measurement values and predicted state error correlation at a later instant of time given all process noise estimates up through the later instant.
The above and other objects, features and advantages of the invention will become apparent from a consideration of the subsequent detailed description presented in connection with accompanying drawings, in which:
The invention provides an efficient VLSI (Very Large Scale Integration) oriented recursive architecture for a MIMO Kalman equalizer in a receiver for CDMA-based communications. As described below, many redundant matrix-matrix multiplications are eliminated in the block-displacement structure (by using a simple data loading process), substantially lowering the complexity and therefore the processing power requirements of the hardware used by the receiver. A divide-and-conquer methodology is applied to partition the MIMO displacement structure into more tractable sub block architectures in the Kalman recursion. By utilizing the block Toeplitz structure of the channel matrix, an FFT-based acceleration is proposed to avoid direct matrix-matrix multiplications in the time domain for predicted state-error correlation matrix and Kalman gain. Finally, an iterative Conjugate-Gradient (CG) based algorithm is proposed to avoid the inverse of the Hermitian innovation correlation matrix in the Kalman gain computer. Further, the data path in the receiver is streamlined by combining the displacement and block-Toeplitz structure. The proposed architecture not only reduces the numerical complexity to O(N·log N) (i.e. order of N·log N) per chip, but also facilitates parallel and pipelined real-time VLSI implementations. The potential performance gain can be more than 2 dB, compared with existing FIR (Finite Impulse Response) LMMSE solutions. Possible applications of the invention include downlink CDMA mobile devices that comply with either CDMA2000 or a WCDMA standard.
The performance of a receiver according to the invention is better than a receiver using a FIR LMMSE chip equalizer, and its tracking capability is also superior. Its computational complexity is reduced significantly from that of a receiver using a conventional Kalman filter because of using the above-mentioned displacement structure, a FFT acceleration, and an iterative inverse solver, all described in more detail below.
1. Review and Notation
By way of background, a so-called Toeplitz matrix is any n x n matrix with values constant along each (top-left to lower-right) diagonal. That is, a Toeplitz matrix has the form
Numerical problems involving Toeplitz matrices typically have fast solutions. For example, the inverse of a symmetric, positive-definite n×n Toeplitz matrix can be found in O(n2) time.
Regarding using so-called displacement structure in calculations involving matrices, a displacement operator V is sometimes defined as mapping an m×n matrix V into a new m×n matrix as follows: ∇(V)=DV−VA, where D is some m×m and A is some n×n matrix. The ∇-displacement rank of V is defined as the rank of the matrix ∇(V). If this rank is a, then ∇(V) can be written as GB with G an m×a matrix and B an a×n matrix. The pair (G, B) is then called a ∇-generator for V. If α is small and D and A are sufficiently simple, then the ∇-operator allows compressing the matrix V to matrices with a total of (m+n) entries. It turns out that one can efficiently compute matrix expressions with the compressed form.
First, we introduce the notation used in the description here, and also review the basics of the procedure used in processing by a Kalman filter. We consider the system model of the MIMO CDMA downlink based on spatial multiplexing with M Tx antennas and Nr Rx antennas. First, the high data rate symbols are demultiplexed into U*M lower-rate substreams, where U is the number of spreading codes used in the system for data transmission. The substreams are broken into M groups, where each substream in the group is spread with a spreading code of spreading factor F. The groups of substreams are then combined and scrambled with long scrambling codes and transmitted through the tth Tx antenna. The chip level signal at the tth transmit antenna is given by,
xt(i)=Σu=1Ustu(j)·ctu[i]+stP(j)·ctP[i] (1)
where j is the symbol index, i is chip index and u is the index of the composite spreading code. stu[j] is the jth symbol of the uth code at the tth substream. In the following, we focus on the jth symbol index and omit the index for simplicity. ctu[i]=cu[i]ct(s)[i] is the composite spreading code sequence for the uth code at the tth substream where cu[i] is the user specific Hadamard spreading code and ct(s)[i] is the antenna specific scrambling long code. stP[i] denotes the pilot symbols at the th antenna. ctP[i]=cP[i]ct(s)[i] is the composite spreading code for pilot symbols at the to antenna. The received chip level signal at the rth Rx antenna is given by
The channel is characterized by a channel matrix between the tth Tx antenna and the rth Rx antenna as
By collecting the F consecutive chips at the kth symbol from each of the Nr Rx antennas in a signal vector yr(k)=yr(kF+F−1), . . . , yr(kF)]T and packing the signal vectors from each receive antenna, we form a signal vector as y(k)=[y1(k)T, . . . ,yr(k)T, . . . ,yN(k)T]T. Here F is the spreading gain. In vector form, the received signal can be given by
where vr(k) is the additive Gaussian noise. The transmitted chip vector for the tth transmit antenna is given by xt(k)=[x1(kF+k−1) . . . xt(kF) . . . xt(kF−D)]T and the overall transmitted signal vector is given by stacking the substreams for multiple transmit antennas as x(k)=[x1(k)T . . . xt(k)T . . . xM(k)T]T. D is the channel delay spread. The channel matrix from multiple transmit is defined as Hr(k)=[Hl,r(k) . . . Ht,r(k) . . . HM,r(k)], where Ht,r(k) is the channel matrix from the tth transmit antenna and rth receive antenna.
The Kalman filter theory provides a recursive computation structure and best linear unbiased estimate (BLUE) of the state of a linear, discrete time dynamical system. However, the application of the theory in a particular field depends on the choice of the state-space model, which is not specified in the fundamental theory. Here, we follow a model that associates the Kalman theory with the MIMO CDMA downlink equalization.
The Kalman filter estimates the state x given the entire observed data y(1), . . . ,y(k). The Kalman filter is derived from a state-space model consisting of a measurement equation and a process equation. The measure equation describes the generation model of the observation y from the state x in a stochastic noise process. The process equation describe the state transition of the new estimate x(k) at time k from the estimate x(k−1) at time k-l. It is assumed that the transition matrix satisfies the product rule and the inverse rule.
By defining the transition matrix as Θ(k), it is natural to have the measurement equation as the received signal model and the process equation as an excitation of some process noise,
y(k)=H(k)x(k)+v(k) (5)
x(k)=Θ(k)x(k−1)+w(k) (6)
where the measure matrix is the overall MIMO channel matrix H(k) given by H(k)=[Hl(k)T . . . Hr(k)T . . . HNr(k)T]. v(k) denotes the measurement noise and w(k) denotes the process noise.
2. Kalman Filter According to the Invention
a. Common Data Path
Based on an examination of the timing dependency and the physical meaning of the Kalman procedure, the invention provides a reduced or streamlined Kalman filter, so-called because a Kalman filter according to the invention includes a data path common to different components of the filter.
The Kalman filter solves the process and the measurement equations jointly for the unknown states in an optimal manner. An innovation process and the correlation matrix of the innovation process are defined by
a(k)=y(k)−ŷ(k|k−1) (7)
R(k)=E[α(k)αH(k)] (8)
whose physical meaning represents the new information in the observation data y(k). ŷ(k|k−1) denotes the MMSE of the observed data y(k) at time k, given all the past observed data from time 1 to k-1. It is shown that
R(k)=H(k)P(k|k−1)HH(k)+Qv(k) (9)
where the P(k|k−1) matrix is the predicted state error correlation matrix defined by
P(k|k−1)=E[ε(k|k−1)εH(k|k−1) (10)
Here ε(k|k−1)={circumflex over (x)}(k)−{circumflex over (x)}(k|k−1) is the predicted state error vector at time k, using data up to time k−1. By defining a Kalman gain as G(k)=E[x(k)αH(k)]R−1(k), the new state estimate can be given by
{circumflex over (x)}(k|k)=Θ(k)x(k|k−1)+G(k)α(k). (11)
The physical meaning is that we may compute the MMSE {circumflex over (x)}(k|k) of the state of a linear dynamical system by adding a correction item G(k)α(k) to the previous estimate, which is pre-multiplied by the transition matrix Θ(k). The Riccati equation provides a recursive computation procedure of the predicted state error correlation matrix P(k|k−1) and the Kalman gain. By analyzing the data dependency and the timing relationship, the streamlined procedure is given in Table I.
One thing to note here is that, in most Kalman filter notations, the innovation correlation matrix is generated by first pre-multiplying the measurement matrix and then post-multiplying the Hermitian transpose of the measurement matrix as R(k)=H(k)P(k|k−1)HH(k)+Qv(k). However, it is easy to shown that P(k|k−1) is Hermitian symmetric, thus we introduce an intermediate matrix Ω(k)=H(k)P(k|k−1) by only pre-multiplying the channel matrix to it. The generation of the R(k) is formed in a generic R(k)=H(k)ΩH(k)+Qv(k). This change of form will facilitate the complexity reduction as will be shown later.
Referring now to both
The filter 12 provided by the invention can be implemented in various ways, as is known in the art. For example, the filter can be provided as a special purpose integrated circuits—i.e. an application specific integrated circuit (ASIC)—or a combination of ASICs, or can be implemented as software in a more general purpose integrated circuit (e.g. a general purpose microprocessor or ). The filter 12 and radio section 11 can be components of a user equipment device (such as a mobile phone or any other kind of wireless terminal equipped for cellular communication), or can be components of a service access point (SAP) of a radio access network (RAN) of a cellular communication network. The invention is best implemented in the same manner as for most other equalization algorithms for wireless receivers. Typically, the invention would be implemented as software to be embedded in a general-purpose signal processing chip. The faster alternative of course is to implement the invention in an ASIC. Like any filter using a non-blind equalization method, a filter according to the invention requires that an estimate of the channel parameters be available and thus is only part of a comprehensive receiver, which includes a channel estimator.
Referring now to
The logic block diagram of the VLSI oriented architecture according to the invention is shown in
On the bottom is the feedback loop of the predicted state error correlation P(k|k) and the Kalman gain computer. Similarly, a MUX first selects from the init value P(0|0) or the delayed feed P(k|k) for the P(k−1|k−1). It is pre-multiplied and then post-multiplied by the transition matrix. The correlation of the process noise Qw(k) is then added to form an intermediate correlation P(k|k−1). This is pre-multiplied by H(k) to generate the Ω(k), whose result is then Hermitian transposed. Note that the Hermitian transpose is a virtual operation with no time/memory resource usage because the subsequential operations can use the structure of ΩH(k) explicitly. ΩH(k) is pre-multiplied by H(k) and the result is added to the measurement noise correlation matrix Qv(k) to form the innovation correlation R(k). The Kalman is produced as the result of the pre-multiplication of ΩH(k) with the inverse of R(k). The P(k|k) is then updated in the Riccati processor accordingly. In this streamlined data path, the commonality for the Riccati processor and Kalman gain processor is extracted as the dotted gray area. The timing and dependency relationship between each block is indicated. The recursive structure is reduced to several pre/post-multiplications by the transition matrix, the pre-multiplications by the channel matrix and one inverse.
b. MIMO Displacement Structure
Despite streamlining to reduce redundancy, the computation complexity still remains the same order. Both the matrix inverse and the matrix-matrix multiplication have O(N3) complexity for an N×N matrix. It turns out that because the transition matrix has some displacement structure, the matrix multiplication complexity can be dramatically reduced.
It has been shown that the transition matrix can be expressed as follows, indicating a block displacement structure:
where F is the spreading factor in the spreading codes, D is the channel delay spread, IM is an identity matrix of size M×M with only diagonal elements 1, {circle around (·)} is the Kronecker product, and M is the number of transmit antennas (and note that M has no relationship to F and D). Using the Kronecker product, the matrix in (12) can be expanded as follows:
The process noise is then given by w(k)=[w1(k)T . . . wt(k)T . . . wM(k)T]T where the process noise for the tth transmit antenna is given by wt(k)=[xt(kF+k−1) . . . xt(kF)0 . . . 0]T. It is easy to verify that to pre-multiply a matrix with {tilde over (Θ)}(k) is equivalent to shifting the first D rows of the matrix to the bottom and adding F rows of zeros to the upper portion. To post-multiply a matrix with {tilde over (Θ)}(k) is equivalent to shifting the first D columns of the matrix to the right and adding F rows of zeros to the left portion. For the MIMO case, the feature forms a block-displacement structure and will be applied to related computations.
b-1. State Transition Equation
It is shown that the state transition equation can be partitioned into M transmit antennas using the Kronecker product {circumflex over (x)}(k|k−1)=[{circumflex over (x)}1(k|k−1)T . . . {circumflex over (x)}t(k|k−1)T . . . {circumflex over (x)}M(k|k−1)T]T. Thus, the tth sub block of the transition is given by
{circumflex over (x)}t(k|k−1)={tilde over (Θ)}(k){circumflex over (x)}t(k−1|k−1)=[01×F {circumflex over (x)}tU(k−1|k−1)T]T
where
{circumflex over (x)}tU(k−1|k−1)≡[{circumflex over (x)}t(k−1|k−1,0) . . . {circumflex over (x)}t(k−1|k−1,D−1)]T
is the upper D rows of the previous state. This partitioning process is shown in
b-2. Filtered State Estimation Output & Feedback
This displacement structure can be further applied in the filtered state estimation and feedback process. Similarly we can partition the update equation
{circumflex over (x)}(k|k)={circumflex over (x)}(k|k−1)+G(k)α(k)
into
{circumflex over (x)}t(k|k)={circumflex over (x)}t(k|k−2)+Gt(k)α(k)
where {circumflex over (x)}(k|k)=[{circumflex over (x)}tU(k|k)T . . . ]T, and G(k)=[. . . GtT . . . ]T. We further partition the element-wise state estimate and the Kalman gain into three sub-blocks, the upper D rows, the lower D rows and the rest rows in the center as
{circumflex over (x)}t(k|k)=[{circumflex over (x)}tU(k|k)T{circumflex over (x)}tC(k|k)T{circumflex over (x)}tL(k|k)T]T and Gt(k)T=[GtU(k)TGtC(k)TGtL(k)T]T.
We define the effective transition state vector XtL(k−1) as the lower D rows of the state at time (k−1). It can be shown from the transition that the upper and center portions of the new states do not need to add the previous states. Only the lower portion is updated from the previous state with the Kalman gain. Then the new effective transition state vector is simply a copy of the new upper portion of the state. In the real-time implementation, only this portion is stored and fed back to form the state transition, according to the following.
{circumflex over (x)}tU(k|k)=GtU(k)α(k) {circumflex over (x)}tC(k|k)=GtC(k)α(k) {circumflex over (x)}tL(k|k)=xtL(k−1)+GtL(k)α(k) xtL(k)={circumflex over (x)}tU(k|k) (13)
This can accelerate the feedback before the whole vector is ready. The transition matrix-vector multiplication and part of the vector addition are eliminated. The storage of the transition vector is also reduced.
b-3. Predicted State Error Correlation Matrix
Another process involved with the transition matrix is the computation of the predicted state error correlation matrix P(k|k−1)=Θ(k)P(k−1|k−1)Θ(k)H+Qw(k). It is shown that the process noise correlation is given by
Thus if we span the MIMO correlation matrix from the sub blocks as P(k|k−1)=span {Pt1,t2(k|k-1)} and P(k-1|k-1)=span{Pt1,t2(k−1|k−1)} for t1 and t2=1 to M, we can get the partition sub blocks given by
Pt
By span {Pt1,t2(k|k−1)}, we mean that the matrix P(k|k−1) is formed by the submatrices Pt1,t2(k|k−1) where t1 and t2 are the subblock indices, i.e.,
With the feature of the pre-multiplication and post-multiplication with the displacement transition matrix, we can show that the new state error correlation matrix is given by the following partitioning,
where the sub block matrix ρt
ρt
Thus, the matrix multiplications and additions in computing P(k|k−1) from P(k−1|k−1) are all eliminated. Logically we only need to copy some small sub-blocks of P(k−1|k−1) to Qw(k) following the special pattern. Actually, the storage of the full matrix is not necessary since the matrix is sparse with many zero entries. This displacement procedure is demonstrated by the data loading process in
b-4. Update State Error Correlation Matrix
Jointly considering the feedback data path of P(k|k) and the displacement structure in P(k|k−1), it is clear that only the upper left corner ρt
Instead of computing the full matrix of P(k|k), we only need to compute the relevant submatrices given by
ρt
We also partition the Kalman Gain G(k) and the Ω(k) matrices into MIMO sub blocks as
where Gt
Gt,rU(k): upper D×F Ωr,tL(k): left, F×D
Gt,rL(k): lower F×F+ Ωr,tR(k): right, F×F
It is clear that the element block in the Ψ(k) is given by
Comparing the displacement structure, only the left-upper corner of size D×D is necessary, which is given by
This is only associated with the upper part of Gt1,r(k) and left part of Ωr,t2 (k). As a summary, the updated effective state error correlation is simplified by adding the correction item to the D×D corner of {tilde over (Q)}w(k) which is constant to the transmit antenna elements t1 and t2, according to:
ρt
This optimization not only saves many computations and memory storage but also fastens the update and feedback time.
C. FFT Acceleration
The invention also provides an FFT-based architecture. In the innovation and the omega matrix Ω(k) generation, there are some pre-multiplications by the channel matrix H(k) as in α(k)=y(k)−H(k){circumflex over (x)}(k|k−1) and Ω(k)=H(k)P(k|k−1). It can be shown that the matrix has the form:
We define the estimated observation and partition it into the sub-vectors for the multiple receive antennas as
Since the channel matrix from the tth transmit antenna and rth receive antenna Ht,r(n) assumes the Toeplitz structure as
the matrix-vector multiplication can be viewed as a FIR filter with the channel impulse response [ht,Dr, . . . , ht,0r]. This can be implemented in the time domain by delayed tap line architecture as a conventional FIR. It is well known that the time-domain FIR filtering can also be implemented by FFT-based circular convolution in the frequency domain. In [ ], the “overlap-save” based matrix-vector multiplication architecture is proposed to accelerate the computation. The similar architecture can be applied directly to the Kalman filtering problem in this paper. This achieves O((F+D)log(F+D)) complexity algorithm versus O((F+D)2) for the matrix-vector multiplication and O((F+D)2log(F+D)) versus O((F+D)3) for the matrix-matrix multiplications in the innovation estimation and the Kalman Gain computer. The procedure is described briefly as following:
The FFT-based architecture for the MIMO observation estimate ŷ(k)=H(k){circumflex over (x)}(k|k−1) is depicted in
d. Computation Update Rate
We discuss the computation rate of the FFT coefficients here. It is clear that for all the H(k) pre-multiplications in each of the kth iteration, we only need to compute the element-wise FFT of H(k) once. We only need to compute the FFTs of the right-hand multiply factor and the dot products individually. Specifically, if we define
we only need to compute the element-wise FFT of Ht,r(k) once for both the multiplications. Moreover, even in a fast fading environment, it is most likely that we can assume the channel impulse response is constant for many symbols in a frame. Thus, for the coherence time that the channel coefficients are assumed to be quasi-static, i.e H(k)=H, we only need to compute the FFTs once. For each symbol, there will be one dimension-wise (in the domain of transmit antenna) FFT bank for the estimated state {circumflex over (x)}t(k|k−1) and one dimension-wise (in the domain of receive antenna) IFFT bank for the observation estimate ŷr(k). For the computation of the Ω(k) matrix, there will be element-wise FFT banks for the P(k|k−1). However, after we examine the structure of the P(k|k−1), it is clear that the first F column of each of the Pt1,t(k|k−1) is constant matrix if the Qw(k) is assumed to be constant for the observation frame. Only the right-bottom (D×D) corner of Pt1,t(k|k−1) is variable for each k. This is very likely even in a fast-fading environment as this is an input for a frame. Thus, only D columns need to be recomputed for each k. If we further assume that
Pt
i.e. only the diagonal block of P(k|k−1) will be effective, the computation is simplified as:
Ωr,t(k)=Ht,rPt,t(k|k−1). (27)
It is seen that this simplification does not degrade the performance significantly. As a summary, the computation procedure is described with the different computation rate in Table II:
For the computation of the correlation matrix of innovation as
R(k)=H(k)ΩH(k)+Qv(k)
is also accelerated by the FFT-based computing architecture in the frequency domain after the change of order with the Hermitian feature of P(k|k−1) and Qv(k). The procedure is similar to the Table II for the computation of ΩH(k). Thus, the direct matrix computation involving the channel matrix H(k) is replaced by the FFT-based procedure. This reduces the complexity and facilitates the parallel processing in VLSI architectures.
e. Iterative Inverse Solver
With the aforementioned optimizations, the complexity has been reduced dramatically. However, there is one last hard work in computing the Kalman gain G(k)=ΩH(k)R−1(k).
It is known that a Gaussian elimination can be applied to solve the matrix inverse with complexity of the order of O[(NF)3], where N is the number of receive antennas and F is channel length. Moreover, Cholesky decomposition-can also be applied to increase the speed by reducing the hidden constant factor in the order of complexity. However, since these two solutions do not use the structure of the matrix, the complexity is at the same order as for solving the inverse of a general matrix.
We made the observation that first, R is an (NF×NF) Hermitian symmetric matrix. This can be easily verified as RH(k)=Ω(k)H(k)H+QvH(k)=H(k)P(k|k−1)H(k)H+Qv(k)=R(k) because P(k|k−1)=PH(k|k−1) and Qv(k)=QvH(k) are also Hermitian symmetric. It is known that the iterative CG (conjugate-gradient) algorithm can solve the inverse of this type of matrix more efficiently. Second, the full matrix of the G (Kalman gain) is not necessary from the displacement structure of the state transition matrix. Only the lower D×NF (GtL) and the left upper D×D (Gt,rU) corner are required. This feature can also be used to optimize the matrix inverse and the matrix multiplication involved in the Kalman Gain computation.
To avoid having to directly compute the inverse of R using the iterative CG algorithm, the Kalman gain computation and the state update is re-partitioned to generate the following new problem.
X(k)=G(k)α(k)=ΩH(k)[R−1(k)α(k)]=ΩH(k)Φ(k)
Ψ(k)=G(k)Ω(k)=ΩH(k)[R−1(k)Ω(k)]=ΩH(k)Π(k)
where Φ(k)=R−1(k)α(k) and Π(k)=R−1(k)Ω(k) respectively. With this changed order of computation, the iterative procedure of the CG-based algorithm is described next.
f Computation of Φ(k)=R−1(k)α(k):
The computation of Φ(k) is a direct application of the iterative CG algorithm. The procedure is shown in Table III.
In Table III, the quatities μ, v, and δj are scalars, and Γj, Δj−1 and Φj are vectors. Also, RΔj−1 is a matrix-vector multiplication. Finally, Δj−1HΓj and γjHγj are inner products of two vectors.
Thus, the calculation of the inverse of the R matrix is reduced to performing matrix-vector multiplication in the recursive structure. The Kalman gain is not computed explicitly. Note that the vector X(k)=ΩH(k)Φ(k) can also be partitioned into the X(k)=[ . . . Xt(k)T . . . ]T, Using the displacement structure for the filtered state estimate discussed in section III, the element vector Xt(k) can still be partitioned into the upper, center and lower portion as XtU(k)=[XtU(k) XtC(k)XtL(k)], where
XtU(k)=ΩU(k)HΦ(k)
XtC(k)=ΩC(k)HΦ(k), and
XtL(k)=ΩL(k)HΦ(k).
Using the displacement feedback we can feed back the upper portion once the result is ready, and so speed up the iteration pipelining, and so the complexity of this portion is reduced dramatically.
g. Update of Predicted State Error Correlation
Another computation involving the Kalman gain and the inverse of the correlation matrix of the innovation is the update of the predicted state error correlation P(k|k). With the definition of Π(k)=R−1(k)Ω(k), the CG procedure will need to be applied to the column vectors of Π(k) and Ω(k). Similar to eqn. (21), it can be shown that Ψ(k)=ΩH(k)Π(k) can also be partitioned into sub-block matrices for the MIMO configuration. The element is given by
where the Ωr,t1(k) is the element of the omega matrix Ω(k) and Π(k) is partitioned to
Since only the left upper corner in Ψt1,t2(k) is of interest as shown in
The full matrix of Π(k) is not necessary and the whole matrix multiplication by ΩH(k) is redundant. Thus, if the Π(k) is defined by column sub-matrices as Π(k)=[Π1(k) . . . Πt(k) . . . ΠM(k)], and each Πt(k) is further partitioned into the left portion and right portion as Πt(k)=[ΠtL(k)ΠtR(k)], we only need to calculate the left portion from the CG iterative algorithm. Because the iterative algorithm finally reduces to matrix-vector multiplications in a loop, the interested columns can be easily identified and picked up by simply ignoring the right portions. The effective data for both the ΩH(k) and Π(k) are shown in
In Table IV, the notation Ai,j(:,l) denotes the lth column vector of the matrix Aij, which in turn is a subblock matrix of a the (full) matrix A. The full matrix A is partitioned into subblock matrices Ai,j so as to be expressible as:
The notation Ai,j(:,l) indicates a vector having as components the lth column of the subblock matrix Ai,j=[Ai,j(:,l) . . . Ai,j(:,l) . . . ], i.e.
Also, ρt,l, σt,l, κi,j+1,l are scalars ηt,jH(:,l), λi,j−1(:,l) and ξi,j(:,l) are the l-th column vectors of the submatrices ηtj, λij−1, and ξtj, respectively, where t is the transmit antenna index and j is the iteration number. So the related computations in the above procedure include a matrix-vector multiplication as in Rλij+1, the inner product (also called the scalar product because it results in a scalar) of two vectors as in [λtj−1(:,l)]H·ξtj(:,l) and as in ηtjH(:,l)·ηtj(:,l) and some scalar multiplications as in ρt,lλtj−1(:,l) and μt,lξtj(:,l) and σt,lλtj−1(:,l) for l=1:D. The scalar product (i.e. the inner product, as opposed to the multiplication of two scalars) is denoted in a compact form as λtj−1·diag(ρt,l, . . . , ρt,l, . . . , ρt,D) and ξtj·diag(ρt,l, . . . , ρt,l, . . . , ρt,D) and λtj−1·diag(σt,l . . . σt,l . . . σt,D) respectively, where diag(ρt,l, . . . , ρt,l, . . . , ρt,D) as in:
(Thus, λtj−1·diag(ρt,l, . . . , ρt,l, . . . , ρt,D) is compact notation for l=1:D of the ρt,l*λtj−1(:,l), and so there is actually no matrix computation. Instead, we only carry out a scaling of each column vector using the corresponding scalor pt,l.)
Note that the D columns in the tth sub-column matrix can be computed independently, as well as the M sub-columns. The computation of κt,0H=ηt,0(:,l) is actually a norm computation for the lth column vector of ηt,0, i.e. ηt,0(:,l). Similarly, ξtj(:,l) is the lth column vector of ξtj and λtj−1(:,l) the lth column vector of λtj−1. The computation of κij+t,l, and ρt,l only involves the so-called dot-product (scalar product) of two vectors. The matrix multiplications λtj−1diag(ρt,l . . . ρt,l . . . ρt,D), ξtjdiag(ρt,l . . . ρt,l . . . ρt,D), and λtj−1diag(σt,l . . . σt,l . . . σt,D) are actually implemented by independent scaling of the column vectors of the left-side matrix. The computation is dominated by the matrix-submatrix multiplication ξtj=Rλtj−1 which requires D independent matrix-vector multiplications. Thus, the direct-matrix inverse of R is avoided and the “inverse+multiplication” is reduced to a small portion of the matrix-vector multiplications in an iteration loop. Combining the complexity reduction in calculating Ψt1,t2(k), the complexity order is reduced significantly.
Note that the CG algorithm alone converts the matrix inverse of R and the matrix multiplication of R−1(k)Ω(k) into an iterative matrix-matrix multiplication RΩ. If the whole matrix-matrix multiplication needs to be computed, the complexity is still O(N3). However, with the displacement structure, we also need to compute a portion of the matrix-matrix multiplication of RΩ, which is determined by the effective data of Ω as shown in
It is to be understood that the above-described arrangements are only illustrative of the application of the principles of the present invention. Numerous modifications and alternative arrangements may be devised by those skilled in the art without departing from the scope of the present invention, and the appended claims are intended to cover such modifications and arrangements.