The exemplary and non-limiting embodiments of this invention relate generally to wireless communication systems, methods, devices and computer program products and, more specifically, relate to techniques for receiving a transmission in a multiple input-multiple output wireless communication system.
Various abbreviations that appear in the specification and/or in the drawing figures are defined as follows:
3GPP 3rd generation partnership project
ASIC application specific integrated circuit
MAP maximum a posteriori
MIMO multiple input multiple output
LTE long term evolution
OFDM orthogonal frequency division multiplexing
PED partial Euclidian distance
QAM quadrature amplitude modulation
QLD QL decomposition
QRD QR decomposition
SISO soft input soft output
Of particular interest herein are detection algorithms for MIMO OFDM downlink receivers, such as those utilized in emerging wireless standards targeting high data-rates. In such wireless systems a large spectral efficiency is assumed, and multiple transmit antennas are combined with high-order modulation. A particular problem that arises at the receiver is the very large computational complexity of various proposed detection algorithms that exhibit an excellent quality of service, such as MAP detection. The high computational complexity can inhibit, or prohibit, the use of such algorithms in a practical implementation. As a result, sub-optimal detection schemes that exhibit an acceptable error-rate performance, with a reduced computational complexity/latency, are becoming particularly attractive. One such lower complexity detector algorithm is known as the sphere detection algorithm.
In order to provide the lower complexity approximation of optimal joint detection-decoding, an inner soft sphere detector is interfaced with an outer SISO channel decoder. There are several proposed sphere detection schemes. One of the most popular is the QRD-M or K-best detection sphere algorithm, with M=16, that is proposed for emerging MIMO OFDM downlink receivers, such as 3GPP-LTE, IMT-advance, 4G, WLAN, and WiMAX downlink receivers. This algorithm provides a high and constant detection throughput with acceptable error-rate performance.
Reference may be made to K. J. Kim et al, “A QRD-M/Kalman filter-based detection and channel estimation algorithm for MIMO-OFDM systems”, IEEE Trans. on wireless communication, vol. 4, pp. 710-721, 2005 for a description of conventional QRD-M detection.
Reference may also be had to K. Jeon e al., “An Efficient QRD-M Algorithm Using Partial Decision Feedback Detection”, Signals, Systems and Computers, 2006, ACSSC '06, Fortieth Asilomar Conference, October-November 2006, pgs. 1658-1661.
As explained by Jeon et al., in the conventional QRD-M algorithms the channel matrix H is decomposed as H=QR by using the QR decomposition based on the modified Gram-Schmidt (MGS) method, where Q is an Nr×Nt unitary matrix and R is an Nt×Nt upper triangular matrix. The matrix R is represented as shown in
y=Q
H
r=Rs+n′,
where y is the Nr×1 vector and the statistics of the Nr×1 noise vector n′=QHn remains unchanged. The foregoing equation is converted into a tree structure by using the property of the matrix R.
The QRD-M algorithm starts from calculating the first branch metrics for all possible s1. The branch metrics are calculated as:
|y1−R1,1s1|2.
At the first stage, constant M branches with the smallest accumulated metrics are selected as survival paths. At the second stage, each survival path is extended to |Ω| branches, where |Ω| is the cardinality of modulation set Ω. Therefore, there are M|Ω| combinations of s1 and s2. Only M paths with the smallest accumulated metrics out of M|Q| are selected. This process is repeated until Nt tree depth. At the last stage, a path with the minimum accumulated metric is detected.
The use of QRD-M detection significantly reduces the complexity of a ML algorithm, where the search for the most probable transmitted vector-candidate is performed jointly for all transmit antennas. The QRD-M algorithm assumes a breadth-first candidate-search where all symbol-candidates for one transmit antenna are first found before continuing the candidate-search for the next transmit antenna. Once all valid candidates for one transmit antenna are found, the best M partial vector-candidates are chosen to proceed with for the next transmit antenna. The best candidates are those with the smallest PEDs. Therefore, for every transmit antenna, the candidate-search process is followed by the sorting of found candidates, which also affects the overall latency of detection process. When the candidate-search for the last transmit antenna is performed, the best M final candidates are used to compute soft information for corresponding coded bits for the outer SISO decoder.
Reference may also be made to commonly owned US Patent Application Publication US 2005/0002359 A1, published Jan. 6, 2005, “Apparatus, and Associated Method, for Detecting Data Communicated to a Receiving Station in a Multiple-Channel Communication System”, Kyeong Jin Kim, which is incorporated herein in its entirety.
The foregoing and other problems are overcome, and other advantages are realized, in accordance with the non-limiting and exemplary embodiments of this invention.
In a first aspect thereof the exemplary embodiments of this invention provide a method that includes receiving signals from y pairs of antennas, where y is greater than one, and where the received signals convey coded bits of information; processing signals received from pairs of the antennas in parallel to find partial Euclidian distances and determine valid partial candidates for individual antennas; sorting valid partial candidates to find M best partial candidates; combining the M best partial candidates into M2 final candidates; and using the M2 final candidates in parallel in a plurality of a posteriori probability function units, with corresponding final Euclidian distances, to determine a posteriori reliability information for coded bits.
In another aspect thereof the exemplary embodiments of this invention provide an apparatus that includes a receiver configurable to receive signals from y pairs of antennas, where y is greater than one, and where the received signals convey coded bits of information. The apparatus further includes a detection block comprised of a plurality of search modules configurable to process signals received from pairs of the antennas in parallel to find partial Euclidian distances and determine valid partial candidates for individual antennas; a plurality of sort modules configurable to sort the valid partial candidates to find M best partial candidates to be combined into M2 final candidates; and further comprising a plurality of a posteriori probability function units arranged to process the M2 final candidates in parallel, with corresponding final Euclidian distances, to determine a posteriori reliability information for coded bits.
In another aspect thereof the exemplary embodiments of this invention provide a computer-readable memory medium that stores program instructions, the execution of the program instructions resulting in operations that comprise simultaneously processing signals received from y pairs of antennas, where y is greater than one and where the received signals convey coded bits of information, to find partial Euclidian distances and determine valid partial candidates for individual antennas; sorting valid partial candidates to find M best partial candidates; combining the M best partial candidates into M2 final candidates; and using the M2 final candidates in parallel in a plurality of a posteriori probability function units, with corresponding final Euclidian distances, to determine a posteriori reliability information for coded bits.
In a further aspect thereof the exemplary embodiments of this invention provide an apparatus that includes means for simultaneously processing signals received from y pairs of antennas, where y is greater than one and where the received signals convey coded bits of information, for finding partial Euclidian distances and determining valid partial candidates for individual antennas; means for sorting valid partial candidates to find M best partial candidates; means for combining the M best partial candidates into M2 final candidates; and means for using the M2 final candidates in parallel in a plurality of a posteriori probability function units, with corresponding final Euclidian distances, for determining a posteriori reliability information for coded bits and outputting soft decoded bits to an outer channel decoder.
The foregoing and other aspects of the teachings of this invention are made more evident in the following Detailed Description, when read in conjunction with the attached Drawing Figures, wherein:
The exemplary embodiments of this invention enable a further reduction in the detection complexity and latency of the sphere-type algorithm, while preserving (and even improving) the error-rate performance.
The use of the exemplary embodiments of this invention decreases processing latency of QRD-M detection while preserving the error-rate performance. The QRD-M algorithm with parameter M=16 is a detection scheme proposed for emerging MIMO OFDM downlink wireless receivers, such as the 3GPP-LTE, IMT-advance, 4G, WLAN, and WiMAX downlink receivers, and assumes a case of four transmit and receive antennas and 16-QAM modulation. This is but one non-limiting example of an application for the embodiments of this invention, as they may be readily applied in general to a number of systems having an equal number of antennas in the transmitter and the receiver.
The detection approach in accordance with the exemplary embodiments of this invention is compared below (in terms of detection latency, computational complexity and error-rate performance) with conventional QRD-M detection.
The exemplary embodiments of this invention provide an improved detection procedure, referred to for convenience as a QRD-QLD detection algorithm, that simplifies the candidate-search process as well as the sorting of found candidates. The use of QRD-QLD detection algorithm assumes pre-processing based on both QR decomposition and QL decomposition of the channel matrix. The QRD-QLD detection algorithm may be used, for example, as a detector in a downlink receiver (e.g., in a mobile subscriber equipment or more generally in a user equipment) in a wireless system with four transmit/receive antennas and 16-QAM modulation. More generally, the QRD-QLD detection algorithm may be used for any transmission between a base station and the mobile subscriber system that exhibits high spectral efficiency.
The implementation of the QRD-QLD detection algorithm may be particularly advantageous in an integrated circuit embodiment, such as embodied in an ASIC, as it provides benefits in terms of area requirements, cost and detection latency. One important benefit of the use of QRD-QLD detection algorithm is that it may include identical arithmetic function units, such as the search function unit and sorting module, as for the QRD-M approach. As a result, a common ASIC design may need only a modest hardware overhead to implement both the conventional QRD-M detection algorithm and the QRD-QLD detection algorithm in accordance with the exemplary embodiments of this invention.
As such, it should be noted that a reference herein to the “QRD-QLD detection algorithm” or to the “QRD-QLD algorithm” is not intended to limit the embodiments of the invention to computer program code or computer program instructions stored in a computer-readable memory medium. That is, the QRD-QLD algorithm may be implemented entirely in hardware (for example, within an integrated circuit), or it may be implemented as a combination of hardware and computer program instructions, or it may be implemented entirely in software (e.g., entirely as computer program instructions executed by, for example, a digital signal processor).
One significant advantage that is gained by the use of QRD-QLD detection algorithm is a simplified candidate-search process. For example, if a wireless system with four transmit/receive antennas is assumed, the candidate-search process is divided into two parallel and independent search processes for two pairs of transmit antennas. The number of search operations is substantially reduced as compared to the QRD-M approach.
A search operation is considered herein as the testing of all constellation points to determine whether the constellation points are within a pre-determined spherical region. The updating of PEDs is based on the PED of a previously found parent candidate. After finding partial candidates for two pairs of transmit antennas, the best M partial candidates for each group are determined and combined together to provide a list of M2 final candidates. It may be observed that the complexity of candidate-sorting (i.e., the sorting of computed PEDs) is reduced as compared to the conventional QRD-M approach (with M=16), where candidate-sorting is employed for each transmit antenna. Furthermore, the QL decomposition, as a pre-processing portion of the QRD-QLD algorithm, can be avoided and replaced with QR decomposition of the channel matrix with the columns in reverse order. As a result, an additional hardware unit for QL decomposition is not required.
Describing now in further detail the QRD-QLD algorithm in accordance with the exemplary embodiments of this invention, reference is made first to
In contrast to
It can thus be observed by comparing
Regarding the latency of the detection process, the latency of the search operations plus candidate sorting, for the QRD-M approach, is given by:
This can be contrasted with the latency observed for the QRD-QLD detection algorithm, which is given by:
The parameters α, β and γγ depend on the number of comparators used. In general, these parameters are equal to zero if there are a sufficient number of comparators for the maximum level of processing parallelism, otherwise they account for the additional comparison delay, in clock cycles, per search level.
For the QRD-M approach, at the first level the algorithm finds the M best out of 2Mc candidates, and from the 2nd to the 4th levels finds the M best out of M*2Mc candidates. In contrast, the QRD-QLD algorithm finds the M best out of MaxCand candidates after the 2nd search level. With the available number of comparators, the algorithm finds the smallest PED, and excludes it from the sorting list, then finds the next smallest PED, excludes it from the sorting list, and so forth.
For a classical sorting of N numbers, such as in a bubble sort, the average complexity is given by N*log2 N. However, for the search of M smallest values (PED values) the complexity is reduced to M*(ξ+log2 N), M<<N where ξ is the additional latency if the available number of comparators cannot support full parallelism.
Reference is made to
The latency of updating the soft information to the SISO channel decoder 50 for a final candidate soft decision can be about one clock cycle (e.g. for a 200 MHz clock frequency). The overall latency for the QRD-QRL detection block 40, assuming M=18, and M2 final candidates, is about 81 clock cycles.
Assuming a case of no feedback from the SISO channel decoder 50 to the QRD-QLD detection block 40 (to avoid increased latency and increased power dissipation), four parallel a posteriori probability functional units (APP FUS), 64 comparators for sorting, and two search units, the reduction in total latency, versus the conventional QRD-M approach, is illustrated in
It can be noted that the parameter M in the QRD-QLD detection block 40 can be variable, and may be a function of channel conditions. In general, a larger sum of the diagonal elements in the upper triangular matrix R implies a better channel condition, and vice versa (a smaller value of M is sufficient for good channels, and vice versa).
It can be shown that, using the same hardware resources, the total detection latency with conventional QRD-M with M=16 is about 198 clock cycles, while the total detection latency with QRD-QLD with average M equal to 17.75 is about 162 clock cycles, which is a significant improvement.
As should be appreciated, a QRD-QLD soft sphere detection apparatus and method has been herewith described. The QRD-QLD detector 40 is suitable to be implemented (at least partially) in an integrated circuit, such as in a customized ASIC.
Describing now the exemplary embodiments in even greater detail, reference can be made to
Best partial candidates are stored in Look-up Tables (LUTs) 44A, 44B, 44C, 44D and combined together by blocks 45A, 45B into M2 final candidates. The final candidates are utilized by four parallel a posteriori probability (APP) function units (APP FUs) 46A-46D via symbol-to-bit demappers 49A, 49B, with corresponding final EDs. The APP FU 46 computes the a posteriori reliability information for coded bits based on the list of final candidates.
Interface networks and interconnect networks are also depicted in
Pre-Processing Unit 47
The pre-processing unit 47 calculates center of the hyper-sphere, as well as the common factors defined below in Eq. 2 used for testing all symbol-candidates for each transmit antenna. The center of the hyper-sphere is calculated as:
ŷ=Q
H
·y. (1)
Factors Fm are pre-computed in advance for all symbol-candidates according to:
F
m
=ŷ
m
−R
mm
·s
mm
, m=N
t, . . . , 1. (2)
These factors are computed for all Nt transmit antennas, saved in the registers 42A, 42B, and utilized in the appropriate search level. It is not required to compute all 2·2M
Search Modules 48A, 48B
The search modules 48A, 48B simultaneously compute partial Euclidian distances (PEDs) for all PC=2M
As was noted, if one assumes a wireless system with Nt=4 transmit antennas the order in which transmit antennas are detected is irrelevant for the architecture design. In this particular case the order is: search for antenna 4 followed by antenna 2 in search module 48A, performed in parallel with the search for antenna 3 followed by antenna 1 in search module 48B. The search module 48 for the first detected transmit antenna (4th transmit antenna or the most reliable antenna determined after reordering of channel columns) computes PEDs for all 2M
P
4
q
=|F
4
q|2≦r2; q=1, . . . 2M
For every valid candidate c4 from the first search level, cumulative PEDs of the fourth and second transmit antennas are computed in the second search level within the same search module 48 according to:
P
2
q
=P
4(c2)+|F3q−R43c4|2≦r2; q=1, . . . 2M
The second search module 48B computes in parallel the same equations for the third and first transmit antenna:
P
3
q
=|F
1
q|2≦r2; q=1, . . . 2M
For every valid candidate c3 from the third transmit antenna (the second most reliable antenna), cumulative PEDs of the third and first transmit antennas are computed as:
P
1
q
=P
3(c2)+|F2q−R12c3|2≦r2; q=1, . . . 2M
If the maximum pre-determined number of candidates for two initial search levels is found, the search process stops. Partial vector-candidates are combined into M2 final vector-candidates [c4c2c3c1] after determining the best M candidates for two pairs of transmit antennas. The best partial candidates are saved in the LUTs 44A-44D and used in the computation of APP messages for the outer decoder 50.
For fully parallel computation of all products Rmj·cj inside every search level j (j=m, . . . , Nt), up to twelve FUCSA function units 60 (check/shift/add function unit, see
It can be observed that computation of Euclidian distances from Eq. 4 can be rewritten as:
P
2
=P
4(c4)+|Re{X}+iIm{X}|2≦r2, (7)
where X=F3−R43c4 . There are four different values of Re{X} and four different values of Im{X} in the case of 16-QAM. Therefore, and referring also to
APP Function Units 46A-46D
The APP FUs 46A-46D simultaneously compute the a posteriori probabilities for M·MC coded bits transmitted per one channel realization. A de-mapping of the final vector of symbol-candidates s into bits xk (k=1, . . . , M·MC) is employed. It can be noticed that an already-computed Euclidian distance ED(s) that corresponds to the final vector-candidate s can be directly used for computation of extrinsic probabilities:
In order to simplify the computation, the inner product between the a priori probabilities and coded bits is calculated using the sign-conversion and summation, and then the appropriate LA(xk) is excluded:
x
[k]
T
·L
A,[k]
=x
T
·L
A−sign(xk)·LA(xk), ∀k=1, . . . , M·MC. (9)
M·MC updated extrinsic probabilities LE(xk|y) are directly stored into memory of the outer LDPC decoder 50.
The Table in
Sorting Units 43A, 43B
The sorting function units 43A, 43B each include a binary tree of comparators 10 as shown in
In summary, as compared with conventional QRD-M proposed for 3GPP-LTE, IMT-advance, WLAN, WiMAX and an advanced 4G downlink, the QRD-QLD detection algorithm can provide about a two times improvement in the candidate search process for a same or similar error rate performance. Furthermore, only one iteration is needed between the detector 40 and the decoder 50. Furthermore, there is a smaller overall latency with the QRD-QLD approach, with an error rate performance gain over conventional QRD-M.
As was described above, in general the candidate-search process in accordance with these exemplary embodiments is divided into two independent sub-parts for two groups of transmit antennas (assuming an even number of transmit antennas). Within each group the search process is performed sequentially from one transmit antenna to another, providing the breadth-first searching for candidates within the sphere around the received point.
As was noted, the exemplary embodiments of this invention may be used as a detection means in downlink MIMO OFDM receivers for 3GPP-LTE, IMT-advance, 4G, WLAN, and WiMAX standards. It may be used in, for example, a wireless system with four transmit/receive antennas and 16-QAM modulation, which is one possible option in the 3GPP-LTE, IMT-advance, IEEE 802.11n, and IEEE 802.16 standards. The exemplary embodiments of this invention clearly provide superior performance to the conventional QRD-M algorithm with parameter M equal to 16, and provides enhancements in terms of maximum detection latency and error-rate performance.
As was also noted above, the various exemplary embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the exemplary embodiments of this invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
As such, it should be appreciated that at least some aspects of the exemplary embodiments of the inventions may be practiced in various components such as integrated circuit chips and modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be fabricated on a semiconductor substrate. Such software tools can automatically route conductors and locate components on a semiconductor substrate using well established rules of design, as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility for fabrication as one or more integrated circuit devices.
Various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. As but some examples, the use of other similar or equivalent logical circuits and the like may be attempted by those skilled in the art. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention.
Further, while the exemplary embodiments have been described above in the context of the E-UTRAN (UTRAN-LTE) and other wireless systems, it should be appreciated that the exemplary embodiments of this invention are not limited for use with only these particular types of wireless communication system, and that they may be used to advantage in other wireless communication systems.
It should be noted that the terms “connected,” “coupled,” or any variant thereof, mean any connection or coupling, either direct or indirect, between two or more elements, and may encompass the presence of one or more intermediate elements between two elements that are “connected” or “coupled” together. The coupling or connection between the elements can be physical, logical, or a combination thereof. As employed herein two elements may be considered to be “connected” or “coupled” together by the use of one or more wires, cables and/or printed electrical connections, as well as by the use of electromagnetic energy, such as electromagnetic energy having wavelengths in the radio frequency region, the microwave region and the optical (both visible and invisible) region, as several non-limiting and non-exhaustive examples.
Furthermore, some of the features of the examples of this invention may be used to advantage without the corresponding use of other features. As such, the foregoing description should be considered as merely illustrative of the principles, teachings, examples and exemplary embodiments of this invention, and not in limitation thereof.