The invention relates to wireless communication networks, and, more particularly, to techniques for effective wireless communication in the presence of fading and other degradations.
The physical limitations of a wireless channel pose significant challenges for reliable communication. A variety of techniques have been devised to address such issues, including antenna diversity which is seen as a practical and effective technique for reducing the effect of multipath fading in most scattering environments. The classical approach to antenna diversity is to use multiple antennas at the receiver and perform combining or some form of selection to improve the quality of the received signal. Recently, transmitter diversity techniques have been explored, primarily motivated by the feasibility of having multiple antennas at the base station. Spatial multiplexing provided by transmitter diversity facilitates multiple data pipes within the same frequency band, thereby yielding a linear increase in capacity. It has also been discovered that an effective approach to increasing the data rate as well as the power efficiency over wireless channels is to introduce temporal and spatial correlation into signals transmitted from different antennas. This has led to the design of what are referred to in the art as “space-time codes” in which information is transmitted as codewords from multiple antennas at multiple time intervals typically in the form of complex valued amplitudes modulated onto a carrier wave. See, e.g., J.-C. Guey, M. P. Fitz, M. R. Bell, and W.-Y. Kuo, “Signal Design for Transmitter Diversity Wireless Communication Systems over Rayleigh Fading Channels,” Proc. IEEE VTC'96, pp. 136-140, 1996; V. Tarokh, N. Seshadri, A. R. Calderbank, “Space-Time Codes for High Data Rate Wireless Communication: Performance Criterion and Code Construction,” IEEE Trans. Inform. Theory, vol. 44, pp. 744-765, March 1998.
Linear dispersion (LD) codes, for example, are a form of space-time codes that use a linear modulation framework where the transmitted codeword is a linear combination over space and time of certain dispersion matrices with the transmitted symbols. See B. Hassibi and B. Hochwald, “High-Rate Codes that are Linear in Space and Time”, IEEE Trans. Inform. Theory, vol. 48, pp. 1804-1824, July 2002. Linear dispersion codes have the advantages of a very simple encoder design and, furthermore, can be decoded very efficiently either by a polynomial time maximum likelihood decoder, i.e., sphere decoder, or by a suboptimal decoder, e.g., a nulling and cancellation receiver. The linear dispersion codes disclosed by Hassibi et al. were designed to optimize average mutual information; unfortunately, maximizing the average mutual information does not necessarily lead to better performance in terms of error rate. More recently, another scheme based on the linear dispersion code framework called threaded algebraic space-time (TAST) coding has been proposed. See H. E. Gamal, and M. O. Damen, “Universal Space-Time Coding,” IEEE Trans. Inform. Theory, vol. 48, pp. 1097-1119, May 2003. TAST codes are designed based on the threaded layering concept and algebraic number theory, and the design focuses on the worst-case pairwise error probability (PEP). The pairwise error probability, however, may not be the main target for performance evaluation also. The actual dependence of error probability on SNR passes not only through the PEPs but also through the “error coefficients” of the code, i.e., the multiplicity of code word pairs that lead to the same PEP. In general it is not true that the codes optimized with respect to the worst case pairwise error probability will end up with optimum bit or frame error performance.
Accordingly, there is a need for a new approach to the construction of space-time codes that can be optimized to a selected performance metric while still remaining flexible enough to handle different decoder structures.
A design methodology is disclosed herein which is capable of constructing space-time codes for encoding signals from any number of transmitter antennas, where the codes advantageously can be optimized for an arbitrary performance metric, such as bit or frame error probability, and for a selected decoder structure. In accordance with an embodiment of the invention, stochastic approximation is utilized to construct a set of space-time codes for a system with a pre-specified number of transmit and receive antennas. A series of simulated observations are generated using a model of the known communication channel characteristics. The simulated observations are decoded using a selected receiver structure, and measurements are computed of the selected performance characteristic to be optimized for the system. An estimate of the gradient of the performance characteristic as a function of the coding parameters utilized is obtained, and the gradient estimate is then used to update the coding parameters. The updates to the coding parameters can be iterated until convergence to an optimal set of space-time codes. The space-time codes can then be used to encode transmissions from a transmitter with the pre-specified number of transmit antennas to a receiver utilizing the pre-specified number of receive antennas and the selected receiver structure.
The present invention advantageously can be utilized in systems where performance analysis based on algebraic number theory is intractable. Unlike the prior art, the disclosed approach can generate space-time codes which can be applied to a wide range of receiver structures. Moreover, the codes can be optimized in a manner that takes into account issues such as long term spatially correlated fading.
These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.
The transmitter 110 utilizes a coder 130 which arranges the transmitted symbols such that the signal transmitted from the i-th transmit antenna at time index t is denoted by xt,i. The receiver 120 has a corresponding decoder 150 where the signal received at the j-th receive antenna at time t is denoted by yt,j. The input output relation is given by
where the noise wt,j can be modeled as independent samples of a zero-mean complex Gaussian random variable with unit variance. The transmitted energy on all the MT antennas 111, 112, . . . 115 at any given time can be normalized to unity, so that ρ would be the expected SNR at each receive antenna 121, 122, . . . 125 regardless of the number of transmit antennas. This equation can be written in matrix form as
Y=√{square root over (ρ)}XH+W, (2)
where Y is the T×MR matrix of the received signal, X is the T×MT matrix of the transmitted signal, W is the T×MR matrix of the additive white Gaussian noise, and H is the MT×MR channel matrix. When restricted to a Rayleigh fading scenario, the MT×MR elements of H are composed of independent identically distributed (i.i.d.) circularly symmetric complex Gaussian random variables with zero mean and unit variance.
Note that the following notation is utilized herein: scalars are denoted in lower case, vectors are column vectors unless otherwise indicated and are denoted in lower case bold, while matrices are in upper case bold. Tr (R) denotes the trace of the matrix R. RT and RH denote transpose and the conjugate transpose of R, respectively. IM is the M×M identity matrix.
The goal is to construct codes for the coder 130 and decoder 150 that optimize some performance characteristic of the system. For example, it can be advantageous to optimize a performance metric such as the average bit error probability (BEP) or the frame error performance. Consider, for example and without limitation, a set of linear dispersion codes. See B. Hassibi and B. Hochwald, “High-Rate Codes that are Linear in Space and Time”, IEEE Trans. Inform. Theory, vol. 48, pp. 1804-1824, July 2002. The linear dispersion codes introduced therein are designed to maximize the average mutual information. Unfortunately, maximizing the average mutual information does not necessarily lead to better performance in terms of error rate. Accordingly, it is advantageous to optimize the linear dispersion codes in terms of error rate rather than mutual information. Unfortunately, the average bit or frame error rate are hard to analyze for arbitrary linear dispersion codes.
Linear dispersion codes use a linear modulation framework, and the transmitted codeword is a linear combination of certain dispersion matrices with the transmitted symbols. Assume one transmits Q r-QAM symbols {sq}q=1Q over T symbol intervals, the linear dispersion codes X are given by
where the transmitted symbols sq have been decomposed into their real and imaginary parts
sq=αq+jβq, q=1, . . . , Q,
and {Aq, Bq}q=1Q are the dispersion matrices that specify the codes. The rate of the codes is R=(Q/T) log2r. It is also assumed that the dispersion matrices {Aq, Bq}q−1Q satisfy the following energy constraint
Denote YR=ℑ{Y} and YI=ℑ{Y}. Denote the columns YR, YI, HR, HI, WR and WI by yR,n, yI,n, hR,n, hI,n, wR,n and wI,n; and define
Then, the equations can be gathered in YR and YI to form the single real system of equations
where the equivalent 2MRT×2Q real channel matrix is given by
As mentioned above, the average bit or frame error rate are hard to analyze for arbitrary linear dispersion codes. For example, the empirical bit error probability (BEP) is denoted herein as γ (y, x, h, θ) for a given set of dispersion matrices, a given channel realization, a given information symbol vector x, and a given received signal vector y. The set of dispersion matrices is denoted as
θ{Aq, Bq, q=1, . . . , Q}, (10)
and the channel realization as h, i.e.,
When θ is given, the average BEP is obtained by
where p (y, x, h |θ) is the joint probability density function (pdf) of (y, x, h) for a given θ. Note that the empirical BEP γ (y, x, h, θ) usually cannot be given in closed form. Also, γ (y, x, h, θ) depends on the receiver structure. Optimizing the design of the linear dispersion codes requires a solution to the following optimization problem
where the constraint set Θ is given by
Note that one does not lose any optimality in the constraint set by relaxing the energy constraint as the minimum cost always occurs when the energy constraint is satisfied with equality. From the above,
Υ(θ)=ExEhEy|x, h, θ(γ(y, x, h, θ)), (15)
where
Ey|x, h, θ(γ(y, x, h, θ))=∫γ(y, x, h, θ)p(y|x, h, θ)dy, (16)
with p (y|x, h, θ) the conditional pdf of y given (x, h, θ). Note that because p (y|x, h, θ) can be shown to be Gaussian, and it is continuously differentiable in θ, it follows that Υ (θ) is continuously differentiable in θ. Hence Υ (θ) attains a minimum on the compact set Θ.
Although there is no closed-form formula for γ (y, x, h, θ)—it can be evaluated by using the technique of simulation optimization. See, e.g., M. C. Fu, “Optimization via Simulation: A Review,” Annals of Operations Research, Vol. 53, pp.199-248, 1994; S. Andradottir, “A Review of Simulation Optimization Techniques,” Proceedings of the 1998 Winter Simulation Conference, 151-158, 1998.
At step 201, a set of initial coding parameters are selected for the space-time code. The initial coding parameters can be selected randomly or, more preferably, optimized in some manner. At step 202, a series of simulated observations are generated using a model of the known communication channel characteristics. The simulated observations are decoded using a selected receiver structure. The receiver structure advantageously can be other than the conventional maximum likelihood receiver and can even be one of the many suboptimal detector designs. Measurements of the selected performance metric to be optimized for the system can then be computed. At step 203, an estimate of the gradient of the performance metric as a function of the coding parameters utilized is obtained. The basic assumption of stochastic approximation is that the solution to the optimization problem can be solved by finding the zeros of the gradient. At step 204, the coding parameters may be updated using the gradient estimate. These steps 202 to 204 can then be iterated until the coding parameters, at step 205, converge to some advantageous solution. In accordance with the Robbins-Munro algorithm, the coding parameters should converge to a locally optimal solution as long as the bias of the gradient estimates go to zero.
In
yi=√{square root over (ρ)}Hixi+wi, i=1, 2, . . . , M. (17)
At step 303, the selected decoding structure is modeled to decode xi based on the observations yi and the channel value Hi, i=1, 2, . . . , M. At step 304, the empirical BEP γ (yi, xi, hi, θk). can then be computed.
At step 305, it is necessary to generate an estimate of the gradient ∇θΥ (θ) with respect to the current set of dispersion matrices, θk. As discussed below, although the gradient cannot be computed analytically, it is possible to generate an estimator using the score function or likelihood ratio method. The following estimator is generated:
where an explicit formula for ∇θ log p (y|x, h, θ)is given below. It can be shown that the gradient estimator is unbiased, i.e.,
E(ĝ(θk))=∇θΥ(θ)|θ=θ
Although the estimator is unbiased for any integer M, the variance will be smaller for larger M.
At step 306 in
θk+1=ΠΘ(θk−akĝ(θk)), (20)
where ak=c/k for some positive constant c. And for a given set of dispersion matrices θ={Aq, Bq, q=1, . . . , Q}, the projection function ΠΘ is defined by
where {tilde over (θ)}={Ãq=dAq, {tilde over (B)}q=dBq, q=1, . . . , Q} with the scaling factor d given by
As long as the bias of the gradient estimate ĝ (θn) goes to zero, the sequence of estimates of the optimal solution should converge.
It can be shown that if θ ∉ Θ, the function ΠΘ set forth above projects θ to the nearest point in Θ. In essence, when θ ∉ Θ, the function ΠΘ simply scales θ such that the energy constraint is satisfied with equality. Let Φ={φ∈Rd:∥φ∥≦1}, for and
Then, it can be proven that
Clearly ψ ∈ Φ, and for any φ ∈ Φ, we have
∥
From the above, the set of optimum dispersion matrices θ={Aq, Bq, q=1, . . . , Q} depend on the number of transmit antennas MT, the number of receive antennas MR, and the QAM constellations used. The searching result θ also depends on the operating SNR as both the empirical BEP γ (y, x, h, θ) and ∇θ log p (y|x, h, θ) depend on SNR (see below). Therefore, the codes generated by the design procedure set forth in
Although ∇θΥ (θ) cannot be computed analytically, it is possible to estimate the gradient ∇θΥ (θ). For a given set of dispersion matrices θ, a given information symbol vector x, a given channel realization h, from Equation (8) it can be shown that y is Gaussian with mean √{square root over (ρ)}Hx and covariance matrix 1/2I2TM
p(y|x, h, η)=ρ−TM
From Equations (15) and (16),
where it is assumed some regularity conditions hold such that the derivative and integral can be interchanged.
It can be shown for maximum likelihood detection, with probability one, as well as for suboptimal decoders such as nulling and cancellation receivers, that
∇θγ(y, x, h, θ)=0. (28)
The proof for this proposition is provided in an APPENDIX. From this proposition, it can be shown that
The gradient estimator in the above form is referred to as the score function.
The gradients ∇θ log p (y |x, h, θ) required by the score function above can be computed as follows. Note that it is necessary simply to compute the gradient of the following function
f −(y−√{square root over (ρ)}Hx)T(y−√{square root over (ρ)}Hx). (30)
The gradient of f is first computed with respect to AR,q. The (n,l)th entry of the gradient of f (AR,q) is
where ζn and ηl are T-dimensional and MT-dimensional unit column vectors with one in the nth and lth entries, respectively, and zeros elsewhere. From Equation (9),
HA
For the gradients with respect to AI,q, BR,q, and BI,q, similar expressions can be given as
The design methodology depicted in
Most work on space-time codes assumes the idealistic case of independent and identically distributed (i.i.d.) channels, i.e., the spatial fading is uncorrelated. However, in reality, the individual antennas could be correlated due to insufficient antenna spacing and lack of scattering. In certain situations, it can be assumed that the spatial correlation structure is known in advance. This allows the above approach to be tailored to optimally design the space-time code for a specific fading correlation scenario. For example, one could design linear dispersion codes for a typical correlation scenario where the long term spatial correlation can be measured beforehand. It is very difficult (if not impossible) to optimize the design of space-time codes analytically for a specific transmit and receive correlation structure. The present approach turns out to be useful in this scenario as well. The spatial fading correlation depends on the physical geometries of the channel. Assume there is correlation at both the transmitter side and the receiver side. One can employ an advantageous spatial fading correlation model, e.g., as disclosed in H. Bölcskei and A. J. Paulraj, “Performance of Space-Time Codes in the Presence of Spatial Fading Correlations,” Proc Asilomar Conference, September 2000, wherein the channel matrix H can be decomposed into three parts, namely,
H=S1/2HωR1/2, (46)
where Hω is an MT×MR matrix composed of i.i.d. complex Gaussian entries with zero mean and unit variance, and S=S1/2 (S1/2)H and R=R1/2 (R1/2)H are the transmit and receive correlation matrices, respectively. It should be noted that the product-form of the assumption in the above does not incorporate the most general case of spatial fading correlation. The more general approach is to specify the correlation of the channel realization vector h defined in Equation (11). When the long term correlation, i.e. S and R, can be known in advance, this knowledge should be taken into account to lower the error probability. The only modification in the present approach in this case is that the channel matrix H should be randomly generated in accordance with the particular correlation model. All the other steps in the present approach remain the same. The present approach advantageously still will “automatically” generate the optimal codes adapting to the specific correlation structure.
It will be appreciated that those skilled in the art will be able to devise numerous arrangements and variations which, although not explicitly shown or described herein, embody the principles of the invention and are within their spirit and scope. For example, and without limitation, the present invention has been described with particular reference to generating arbitrary linear dispersion codes. As mentioned above, the above approach can be readily extended to other forms of space-time codes that can be parameterized in an advantageous manner.
Appendix
The following sets forth a proof for the following proposition: For maximum likelihood detection, with probability one,
∇θγ(y, x, h, θ)=0. (47)
The n-th entry of the gradient is given by
where ζn is a unit column vector with one in the nth entry, and zeros elsewhere. Note that γ (y, x, h, θ) is the empirical BEP for a given set of dispersion matrices θ, a given information symbol vector x, a given channel realization h, and a given received signal vector y, namely
where {x→{circumflex over (x)}|y, x, h, θ} denotes the event of decoding error where the transmitted symbol vector x is decoded into {circumflex over (x)}given the channel realization h, the received signal vector y, and dispersion matrices θ. For ML decoder, we have
where C is the set composed of all the possible transmitted symbol vectors x. Note that from (9), H depends on the dispersion matrices θ. When θ is perturbed by a small amount Δθ=δζn, we have
Hθ+Δθ=H+ΔH, (51)
where ΔH represents the small perturbation caused by the perturbation Δθ. We need to show that if δ is sufficiently small, then
which means that with a small perturbation of the dispersion matrices θ, given the channel realization h and the received signal vector y, we will end up with the same vector {circumflex over (x)} after decoding. Now we denote
Then it is easy to show that δ>0 with probability one. Therefore, we assume that δ>0. We have
Note that
where we have used the fact that Tr (BABT)≦λmax (A)Tr(BBT), λmax ({circumflex over (x)}{circumflex over (x)}T)=∥{circumflex over (x)}∥2, and ∥A∥F is the Frobinums norm given by ∥A∥F=√{square root over (Tr(AAT))}. Note that due to the finite cardinality of C, we can find a constant such that
then from (55), we have
∥ΔH{circumflex over (x)}∥≦∥ΔH∥F. (56)
Substitute into (54), we obtain
∥y−√{square root over (ρ)}Hθ+Δθ{circumflex over (x)}∥≦∥y−√{square root over (ρ)}H{circumflex over (x)}∥+√{square root over (ρ)}∥ΔH∥F. (57)
Similarly, we can show that for any s ∈ C, s≠{circumflex over (x)}.
Combining the above two equations we obtain that for any s ∈ C, s≠{circumflex over (x)}
Note that due to the continuity of ∥H∥F with respect to the set of dispersion matrices θ as can be seen from (9), when δ is sufficiently small, we have
Therefore, we obtain for any s≠{circumflex over (x)}
which means that
Therefore, when δ is sufficiently small, we have
γ(y, x, h, θ+δζn)=γ(y, x, h, θ) (62)
which means that (47) holds.
Number | Name | Date | Kind |
---|---|---|---|
6185258 | Alamouti et al. | Feb 2001 | B1 |
6430231 | Calderbank et al. | Aug 2002 | B1 |
6741658 | Ionescu | May 2004 | B1 |
7248638 | Banister | Jul 2007 | B1 |
20020044611 | Hassibi et al. | Apr 2002 | A1 |
20030236076 | Brunel | Dec 2003 | A1 |
20070004366 | Prasad et al. | Jan 2007 | A1 |
Number | Date | Country | |
---|---|---|---|
20060018403 A1 | Jan 2006 | US |