This application is directed generally to wireless communication systems. More particularly, but not exclusively, the application relates to methods and apparatus for receiving and decoding transmissions in communications systems using iterative decoding and a priori information to determine log likelihood ratios (LLRs).
Wireless communication systems are widely deployed to provide various types of communication content such as voice, data, video and the like, and deployments are likely to increase with introduction of new data oriented systems such as Long Term Evolution (LTE) systems. Wireless communications systems may be multiple-access systems capable of supporting communication with multiple users by sharing the available system resources (e.g., bandwidth and transmit power). Examples of such multiple-access systems include code division multiple access (CDMA) systems, time division multiple access (TDMA) systems, frequency division multiple access (FDMA) systems, Third Generation Partnership Project (3GPP) Long Term Evolution (LTE) systems and other orthogonal frequency division multiple access (OFDMA) systems.
Generally, a wireless multiple-access communication system can simultaneously support communication for multiple wireless terminals (also know as user equipments (UEs), or access terminals (ATs). Each terminal communicates with one or more base stations (also know as access points (APs), Node Bs, Enhanced Node Bs (eNBs) via transmissions on forward and reverse links. The forward link (also referred to as a downlink or DL) refers to the communication link from the base stations to the terminals, and the reverse link (also referred to as an uplink or UL) refers to the communication link from the terminals to the base stations. These communication links may be established via a single-input-single-output (SISO), single-input-multiple-output (SIMO), multiple-input-single-output (MISO), or multiple-input-multiple-output (MIMO) system.
MIMO Orthogonal Frequency Division Multiplexing (OFDM) has been adopted for LTE-Advanced (LTE-A) systems to increase capacity and speed. Goals of LTE-A have been set to meet ambitious target data rates, such as 1 Gigabit/second (Gb/s) in local areas and 100 Megabits/second (Mb/s) in wide areas. In order to meet these goals, MIMO techniques and wide spectrum allocation of 100 MHz are expected to be deployed, which should provide potentially significant increases in wireless link capacity. However, a challenge of MIMO implementation has been demodulation and decoding complexity in receiver modules.
This disclosure relates generally to wireless communications systems. More particularly, but not exclusively, this disclosure relates to systems, methods, and apparatus for decoding transmitted signals in a wireless communications systems. The decoding may include using a priori information to enhance decoding performance and/or reduce processing complexity.
In one aspect, the disclosure relates to a method of generating a log likelihood ratio (LLR) metric that may be used to decode a transmitted signal. The method may include generating a K-best set of values, and summing the K-best set of values to generate the LLR metric. The K-best set of values may be determined based at least in part on an a priori priority value. The a priori value may be provided from a decoder module, such as from a turbo decoder.
In another aspect, the disclosure relates to a method of generating a log likelihood ratio (LLR) metric for use in decoding a transmitted signal. The method may include, for example, determining a non-Gaussian approximation for a summation term of the LLR metric, evaluating the non-Gaussian approximation of the summation term, and generating the LLR metric based in part on the evaluation.
In another aspect, the disclosure relates to a method of generating a non-Gaussian approximation of a discrete probability mass function (pmf) for use in decoding a received signal. The method may include, for example, determining a non-Gaussian function approximation corresponding to the pmf, and integrating the non-Gaussian function to replace a summation in generating a value for use in decoding the received signal.
In another aspect, the disclosure relates to a method of generating a log-likelihood (LLR) metric for use in decoding a transmitted signal in a wireless communications system. The method may include, for example, generating a K-Best list of values based in part on an a priori value provided from a turbo decoder, determining a summation based on the K-Best list of values, and generating the LLR metric based in part on the summation.
In another aspect, the disclosure relates to computer program products including computer readable storage media having instructions for causing a computer to perform the above-described methods.
In another aspect, the disclosure relates to communication apparatus and devices configured to perform the above-described methods.
In another aspect, the disclosure relates to communication devices and apparatus including means for performing the above-described methods.
Additional aspects, features, and functionality are further described below in conjunction with the appended drawings.
The present application may be more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, wherein:
In accordance with various aspects as described subsequently herein, efficient iterative detection and decoding apparatus and methods for use in MIMO-OFDM based systems, as well as other communications systems, are described.
In an iterative receiver architecture, a detector and decoder may exchange information. Various algorithms differ in how the soft information is generated from the detector. However, different processing algorithms can be shown to be equivalent to using a Gaussian approximation on the interference in the LLR value or metric computation. However, the Gaussian approximation may not work well for high order modulations such as 64 or 256 quadrature amplitude modulation (QAM), which is used in LTE and LTE-A systems. To address this and other problems, in various aspects, methods and apparatuses are described herein.
Various additional aspect, details, functions, and implementations are further described below in conjunction with the appended drawings. Although the various aspects that follow are described primarily in the context of LTE systems and use LTE terms, in various implementations, the methods and apparatuses described herein may be used for wireless communication networks such as Code Division Multiple Access (CDMA) networks, Time Division Multiple Access (TDMA) networks, Frequency Division Multiple Access (FDMA) networks, Orthogonal FDMA (OFDMA) networks, Single-Carrier FDMA (SC-FDMA) networks, Wi-Max networks, as well as other communications networks. As described, herein, the terms “networks” and “systems” may be used interchangeably.
A CDMA network may implement a radio technology such as Universal Terrestrial Radio Access (UTRA), cdma2000 and the like. UTRA includes Wideband-CDMA (W-CDMA), Time Division Synchronous CDMA (TD-SCDMA), as well as UTRA/UMTS-TDD 1.28 Mcps Low Chip Rate (LCR). Cdma2000 covers IS-2000, IS-95 and IS-856 standards. A TDMA network may implement a radio technology such as Global System for Mobile Communications (GSM).
An OFDMA network may implement a radio technology such as Evolved UTRA (E-UTRA), IEEE 802.11, IEEE 802.16, IEEE 802.20, Flash-OFDM and the like. UTRA, E-UTRA, and GSM are part of Universal Mobile Telecommunication System (UMTS). In particular, Long Term Evolution (LTE) is a release of UMTS that uses E-UTRA. UTRA, E-UTRA, GSM, UMTS and LTE are described in documents provided from an organization named “3rd Generation Partnership Project” (3GPP), and cdma2000 is described in documents from an organization named “3rd Generation Partnership Project 2” (3GPP2). LTE is a 3GPP project aimed at improving the Universal Mobile Telecommunications System (UMTS) mobile phone standard. The 3GPP may define specifications for the next generation of mobile networks, mobile systems, and mobile devices.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect and/or embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects and/or embodiments.
A MIMO system employs multiple (NT) transmit antennas and multiple (NR) receive antennas for data transmission. A MIMO channel formed by the NT transmit and NR receive antennas may be decomposed into NS independent channels, which are also referred to as spatial channels. The maximum spatial multiplexing NS if a linear receiver is used is min(NT, NR), with each of the NS independent channels corresponding to a dimension. This provides an NS increase in spectral efficiency. A MIMO system can provide improved performance (e.g., higher throughput and/or greater reliability) if the additional dimensionalities created by the multiple transmit and receive antennas are utilized. The spatial dimension may be described in terms of a rank.
MIMO systems support time division duplex (TDD) and frequency division duplex (FDD) implementations. In a TDD system, the forward and reverse link transmissions use the same frequency regions so that the reciprocity principle allows the estimation of the forward link channel from the reverse link channel. This enables the access point to extract transmit beamforming gain on the forward link when multiple antennas are available at the access point.
In some implementations a system may utilize time division duplexing (TDD). For TDD, the downlink and uplink share the same frequency spectrum or channel, and downlink and uplink transmissions are sent on the same frequency spectrum. The downlink channel response may thus be correlated with the uplink channel response. A reciprocity principle may allow a downlink channel to be estimated based on transmissions sent via the uplink. These uplink transmissions may be reference signals or uplink control channels (which may be used as reference symbols after demodulation). The uplink transmissions may allow for estimation of a space-selective channel via multiple antennas.
In LTE, a mobile station or device may be referred to as a “terminal,” “user device,” or “user equipment” (UE). A base station may be referred to as an evolved NodeB or eNB. A semi-autonomous base station may be referred to as a home eNB or HeNB. An HeNB may thus be one example of an eNB. The HeNB and/or the coverage area of an HeNB may be referred to as a femtocell, an HeNB cell or a closed subscriber group (CSG) cell (where access is restricted).
A user equipment (UE) 116 (also known as an access terminal or AT) may be within the cell coverage area and may be in communication with antennas 112 and 114, where antennas 112 and 114 transmit information to UE 116 over forward link (also known as a downlink or DL) 120 and receive information from UE 116 over a reverse link (also known as an uplink or UL) 118. Another UE 122 (and/or additional UEs not shown) may be in communication with antennas 106 and 108, where antennas 106 and 108 transmit information to UE 122 over forward link 126 and receive information over reverse link 124.
In a frequency division duplex (FDD) system, communication links 118, 120, 124 and 126 may use different frequency for communication. For example, forward link 120 may use a different frequency then that used by reverse link 118. In a time division duplex (TDD) system, downlinks and uplinks may share the same spectrum.
Each group of antennas and/or the area in which they are designed to communicate is often referred to as a sector of the base station, and may be associated with sector coverage areas, which may be sub-areas of the base station cell coverage area. Antenna groups may each be designed to communicate to UEs in a sector of the cell area covered by the base station 100. In communication over forward links 120 and 126, the transmitting antennas of the base station 100 may utilize beam-forming in order to improve the signal-to-noise ratio of forward links for the different UEs 116 and 122. Also, the base station may use beam-forming to transmit to UEs scattered randomly through its coverage area, which may cause less interference to UEs in neighboring cells than an eNB transmitting through a single antenna to all its UEs.
An eNB, such as the base station 100, may be a fixed station used for communicating with the UEs and may also be referred to as an access point, a Node B, or some other equivalent terminology. In some system configurations, such as heterogenous networks, the base station or eNB may be one of a variety of types and/or power levels. For example, the eNB may be associated with a macrocell, femtocell, picocell, and/or other type of cell. The eNB may be one of a range of different power levels, such as one of a type of macrocell eNB having any of a range of power levels.
A UE may also be denoted as an access terminal, AT, wireless communication device, terminal, or some other equivalent terminology. A UE may be implemented in the form of a wireless handset, computer or wireless module or device for use with a computer, personal digital assistant (PDA), tablet computer or device, or via any other similar or equivalent device or system.
Referring to
As shown in
A network controller element or core network element 250 may couple to a set of eNBs and provide coordination and control for these eNBs. Network controller 250 may be a single network entity or a collection of network entities. Network controller 250 may communicate with eNBs 242, 244, and 246 via a backhaul connection to a core network (CN) function. eNBs 242, 244, and 246 may also communicate with one another, e.g., directly or indirectly via wireless or wireline backhaul.
In some implementations, wireless network 200 may be a homogeneous network that includes only macro base stations or eNBs. Wireless network 200 may also be a heterogeneous network or hetnet that includes eNBs of different types, e.g., macro eNBs, pico eNBs, femto eNBs, relay nodes (RNs), etc. These different types of eNBs may have different transmit power levels, different coverage areas, and different impact on interference in wireless network 200.
For example, macro eNBs may have a high transmit power level (e.g., 20 Watts) whereas pico eNBs, femto eNBs, and relays may have a lower transmit power level (e.g., 1 Watt). The various techniques and aspects described herein may be used in different implementations for homogeneous and heterogeneous networks.
Network 200 may include one or more UEs. For example, network 200 may include UEs 230, 232, 234, 236, 238 and 240 (and/or other UEs not shown). The various UEs may be dispersed throughout wireless network 200, and each UE may be stationary, mobile, or both. As described previously, a UE may communicate with an eNB via a downlink (DL) and an uplink (UL). The downlink (or forward link) refers to the communication link from the eNB to the UE, and the uplink (or reverse link) refers to the communication link from the UE to the eNB. A UE may be able to communicate with macro eNBs, pico eNBs, femto eNBs, relay nodes, and/or other types of eNBs. In
Referring to
For example, UE 350 may include one or more modules to receive signals from base station 310 and/or other base stations (not shown, such as non-serving base stations or base stations of other network types as described previously herein) to access base stations, receive DL signals, determine channel characteristics, perform channel estimates, demodulate received data and generate spatial information, determine power level information, and/or other information associated with base station 310 or other base stations (not shown).
Base station 310 may coordinate with other base stations as described herein to facilitate operations such as forward handovers. This may be done in one or more components (or other components not shown) of base station 310, such as processors 314, 330 and memory 332. Base station 310 may also include a transmit module including one or more components (or other components not shown) of eNB 310, such as transmit modules 322. Base station 310 may include an interference cancellation module including one or more components (or other components not shown), such as processors 330, 342, demodulator module 340, and memory 332 to provide functionality such as redirection of served UEs, communication with associated MMEs, or other network nodes, signaling redirection information, PS suspension information, handover and context information, and/or other information such as is described herein.
Base station 310 may include a processor module including one or more components (or other components not shown), such as processors 330, 314 and memory 332 to perform base station functions as described subsequently herein and/or manage transmitter and/or receiver modules, which may be used to communicate with UEs or other nodes, such as other base stations, MMEs, etc. Base station 310 may also include a control module for controlling receiver functionality. Base station 310 may include a network connection module 390 to provide networking with other systems, such as backhaul systems in the core network (CN), as well as other base stations/eNBs, such as via module 390, or with other components such as are shown herein.
Likewise, UE 350 may include a receive module including one or more components, such as receivers 354 to receive and process MIMO signals. UE 350 may also include a processor module including one or more components (or other components not shown), such as processors 360 and 370, and memory 372, to perform the processing functions associated with MIMO functionality as described subsequently herein. This may include, for example, receiving, decoding, and processing received signal from two or more antennas.
Two or more signals received at UE 350 are processed to receive DL signals and/or extract information such as MIB and SIB information from the DL signals. Additional processing may include estimating channel characteristics, power information, spatial information, and/or other information associated with eNBs, such as base station 310 and/or other base stations, such as Node Bs (not shown), as well as facilitating communicating with other cells or networks and associated nodes, such as base stations or Node Bs of those different networks.
Memory 332 (and/or other memories not shown in
At the base station 310, traffic data for a number of MIMO data streams may be provided from a data source 312 to a transmit (TX) data processor 314, where the data may be processed and transmitted to one or more UEs 350. In one aspect, each data stream is processed and transmitted over a respective transmitter sub-system (shown as transmitters 3221-322N1, and antennas 3241-324N1) of base station 310. TX data processor 314 receives, formats, codes, and interleaves the traffic data for each data stream based on a particular coding scheme selected for that data stream so as to provide coded data. In particular, base station 310 may be configured to determine a particular reference signal and reference signal pattern and provide a transmit signal including the reference signal and/or beamforming information in the selected pattern.
The coded data for each data stream may be multiplexed with pilot data using OFDM techniques. The pilot data is typically a known data pattern that is processed in a known manner and may be used at the receiver system to estimate the channel response. For example, the pilot data may include a reference signal. Pilot data may be provided to TX data processor 314 as shown in
The modulation symbols for all data streams may then be provided to a TX MIMO processor 320, which may further process the modulation symbols (e.g., for OFDM implementation). TX MIMO processor 320 may then provide Nt modulation symbol streams to Nt transmitters (TMTR) 3221 through 322Nt. The various symbols may be mapped to associated RBs for transmission.
TX MIMO processor 320 may apply beamforming weights to the symbols of the data streams and corresponding to the one or more antennas from which the symbol is being transmitted. This may be done by using information such as channel estimation information provided by or in conjunction with the reference signals and/or spatial information provided from a network node such as a UE. For example, a beam B=transpose([b1 b2 . . . bNt]) composes of a set of weights corresponding to each transmit antenna. Transmitting along a beam corresponds to transmitting a modulation symbol x along all antennas scaled by the beam weight for that antenna; that is, on antenna t the transmitted signal is bt*x. When multiple beams are transmitted, the transmitted signal on one antenna is the sum of the signals corresponding to different beams. This can be expressed mathematically as B1x1+B2x2+BNsxNs, where Ns beams are transmitted and xi is the modulation symbol sent using beam Bi. In various implementations beams could be selected in a number of ways. For example, beams could be selected based on channel feedback from a UE, channel knowledge available at the base station, or based on information provided from a UE to facilitate interference mitigation, such as with an adjacent macrocell.
Each transmitter sub-system 3221 through 322Nt receives and processes a respective symbol stream to provide one or more analog signals, and further conditions (e.g., amplifies, filters, and upconverts) the analog signals to provide a modulated signal suitable for transmission over the MIMO channel. Nt modulated signals from transmitters 3221 through 322Nt are then transmitted from Nt antennas 3241 through 324Nt, respectively.
At UE 350, the transmitted modulated signals are received by Nr antennas 3521 through 352Nr and the received signal from each antenna 352 is provided to a respective receiver (RCVR) 3541 through 352Nr. Each receiver 354 conditions (e.g., filters, amplifies and downconverts) a respective received signal, digitizes the conditioned signal to provide samples, and further processes the samples to provide a corresponding “received” symbol stream.
An RX data processor 360 then receives and processes the Nr received symbol streams from Nr receivers 3541 through 352Nr based on a particular receiver processing technique so as to provide Ns “detected” symbol streams so at to provide estimates of the Ns transmitted symbol streams. The RX data processor 360 then demodulates, deinterleaves, and decodes each detected symbol stream to recover the traffic data for the data stream. The processing by RX data processor 360 is typically complementary to that performed by TX MIMO processor 320 and TX data processor 314 in base station 310.
A processor 370 may periodically determine a precoding matrix. Processor 370 may then formulate a reverse link message that may include a matrix index portion and a rank value portion. In various aspects, the reverse link message may include various types of information regarding the communication link and/or the received data stream. The reverse link message may then be processed by a TX data processor 338, which may also receive traffic data for a number of data streams from a data source 336 which may then be modulated by a modulator 380, conditioned by transmitters 3541 through 354Nr, and transmitted back to base station 310. Information transmitted back to base station 310 may include power level and/or spatial information for providing beamforming to mitigate interference from base station 310.
At base station 310, the modulated signals from UE 350 are received by antennas 324, conditioned by receivers 322, demodulated by a demodulator 340, and processed by a RX data processor 342 to extract the message transmitted by UE 350. Processor 330 may then determine which pre-coding matrix to use for determining beamforming weights, and then processes the extracted message.
Receive apparatus 450 may include multiple antennas (e.g., in the examples describes subsequently 2 antennas are used, however, other antenna configurations and antenna numbers may be used in various embodiments). One or more receiver front end modules 452 may downconvert the signals received from the multiple antennas and provide an output to a MIMO processor 454. The MIMO processor may include a demapper module, which may include a joint LLR module, such as described subsequently, for generating an LLR metric for use in decoding the received signals. A turbo decoder 456 may be coupled to the demapper module, such as further illustrated and described subsequently. In particular, a priori information from the turbo decoder 456 may be used to improve and/or simplify decoding performance as described subsequently herein.
It is noted that this example and the subsequent examples are illustrated with respect to a two antenna configuration for purposes of clarity, however, the various aspects may be implemented in systems having configurations with more than two antennas in various configurations.
At the demapper 610, the extrinsic information may be used to improve the estimation of the received signal provided to a Log Likelihood Ratio (LLR) module 612. The LLR module 612 may be configured to compute an LLR metric, L(bk). Summation modules 614 and 634 may be included to add signal components as shown to generate LE1 and LE2.
As shown in equation 710, the LLR, L(bk) may be defined as the logarithm of the ratio of conditional probabilities that bk=0 and bk=1 summed over x1 and x2. Equation 710 may be rewritten in the form of equations 720 and 730 as shown in
However, the complexity of the calculation required to solve equation 730 can be complex and processor intensive, particularly with larger symbol constellations. For example, in the case of a 256 QAM symbol constellation, summing x1 and x2 (for 2 antennas) over 256 values requires 64K (i.e. 65,536) calculations. In addition, configurations with more than two antennas may further add to the complexity and time for generating the LLR. Therefore, it may be desirable to reduce the number of terms in the summation or otherwise simplify the computation.
As can be seen in equation 720, the term ∥y−Hx∥ represents a noise magnitude metric. As the value of the estimation of x approaches the correct value, the magnitude of the noise metric term (y−Hx) in equations 720 and 730 will decrease and the square of this term will be corresponding smaller. Consequently, the exponential of the minus-squared term will be large relative to larger noise metric values. This may result in a summation where only a few terms corresponding to values of x closest to the actual value dominate the sum. Accordingly, in one simplification approach, the smaller-valued terms for y−Hx may be discarded as they will add a relatively small amount to the sum. This can be viewed as identifying a smaller number of summation terms that contribute most of the value to the nested sum.
One solution to simplifying the summation over x2 as shown in
Existing implementations assume that the probability density function for x2 in equation (2) is Gaussian, which can be solved in a close form fashion as follows:
Although Gaussian estimations of probability density such as described above may be used to simplify LLR determination by approximating a discrete probability mass function (pmf), they may not provide a good model of the probability characteristics.
In accordance with one aspect, this disclosure relates to generation and use of non-Gaussian probability approximations for use in LLR determination. Non-Gaussian approximations may be used, for example, directly in LLR determination such as described below. Alternately, or in addition, they may be used as discussed later in this disclosure to determine a K-best list of values for use in summations to determine an LLR.
To motivate use of non-Gaussian approximations, as one example, a four-phase amplitude modulation (4-PAM) implementation may have a symbol constellation where four possible symbols values are transmitted, corresponding to symbol X values of X=−3, −1, 1, and 3. This distribution may correspond to a two bit (b1 , b2) mapping via, for example, a gray code mapping (or other code to, for example, minimize the number of bit errors associated with a symbol error) as follows: (di-bit (0,1) maps to symbol value −3, (0,0) maps to −1, (1,0) maps to 1, and (1,1) maps to value 3).
If the probability of bit b1=1 is 0.6 and the probability of b2=1 is 0.8, the corresponding probabilities are described by the pmf below:
Pr(X=−3)=0.32
Pr(X=−1)=0.08
Pr(X=1)=0.12
and
Pr(X=3)=0.48 (8)
A continuous probability density function estimate may be generated corresponding to the discrete pmf values shown in (8) above, which may then be integrated (rather than summed as shown in inner sum 720 of
However, a Gaussian approximation may generate a probability density estimate that may have a poor correspondence with the actual discrete probability mass and may therefore not provide a good integral approximation of the summation shown in
Instead of using a Gaussian probability estimate (which would correspond with estimate curve 820 as shown in
An example of one embodiment of a non-Gaussian function 830 that may be used for the described 4-PAM case is illustrated in
Examples of embodiments of processes for generating such as non-Gaussian function that may be used for LLR determination are further described below.
For example, in the case of Binary Phase Shift Keying (BPSK) modulation, where it is assumed that the random variable X (corresponding to the transmitted symbol) takes on the discrete values +1 and −1, where the probability that X=1 (Pr(X=1))=p, and Pr(X=−1)=1−p).
This discrete probability mass function (pmf) can be approximated by the function shown below, which can then be integrated:
For a given modulation constellation Q, with Pr(X=xi)=pi and Σpi=1, the pmf can be written in a polynomial form as:
The polynomial shown above in equation (11) is, however, difficult to integrate since, although a closed form exists for a 2nd order polynomial, a closed form is not known for 3rd order or higher polynomials.
In various embodiments, the pmf may be instead be approximated with a second order polynomial approximation in the exponential function for any constellation. For example, the following approximation for Pr(X=x) may be used:
Pr(X=x)=exp(−(c+2rx+ax2)) (12)
In this case, the coefficients may be determined as follows, which minimizes the distance to the desired values:
Curve 830 of
By generating a closed-form approximation for the probability density function, such as by using the above-described non-Gaussian approximation and coefficients, and integrating over the resulting continuous function, a simplified closed form LLR approximation value may be determined, which may be used to improve decoder efficiency and/or performance. In some implementations, other functions, for example other functional forms that provide closed form or otherwise efficient integration processing may be used in place of or in addition to a polynomial function.
In addition, in some embodiments it may be desirable to limit the bounds of integration for an x2 summation (or other similar or equivalent summation) used to generate an LLR metric. For example, the closed form Gaussian function integration, such as described previously herein and illustrated in
Attention is now directed to
In some implementations, the non-Gaussian approximation as described previously can be used to replace a summation term (e.g., the inner sum term as shown in
However, in some implementations it may further be advantageous to use Gaussian and non-Gaussian information to generate a list of values over which to sum. For example, by taking advantage of a priori information as may be provided from a turbo decoder module such as shown in
As noted previously, in general, certain terms of the summation shown in equation 730 may dominate. In one implementation, the summation may be replaced by a determined maximum term (e.g., a term that contributes a large amount to the total value of the sum). In this case the other terms may be discarded, with the maximum value used in place of the sum. This approach is known as the Max-Log Approximation (MLM), which can be used to approximate the LLR value as follows:
Another approach is known as the K-Best implementation (also denoted herein as the “traditional K-Best” approach).
In the traditional approach, the “K” best candidates (where K may take on predefined values such as 3, 4, 8, 16, or other values) are identified so as to minimize the noise term (e.g. ∥y−Hx∥) squared.
For example, this approach can be considered in the context of a 2-dimensional signal constellation 1000 as shown in
In this approach, the radius, r, of the hypersphere may be selected based on a noise metric, such as as a function of ∥y−Hx∥ squared. Consequently, if the noise metric is small, the radius of hypersphere 1030 will be small, whereas with a higher noise metric the radius will be larger. The radius may be iterated to narrow the search to the K-Best values for additional to the list, i.e., the goal is to identify K hypothesis values 1010 within an area, volume, etc., defined by the hypersphere. The list is generated by checking only points within the hypersphere of radius r, for example, as follows:
The traditional K-Best approach does not, however, use a priori information for list generation. In accordance with one aspect, additional performance improvement may be obtained in a receiver by using a priori information to determine or choose the list values (also denoted here as an a priori K-Best list or a priori list). This information may be exchanged between, for example, a demapper and a turbo decoder element such as are shown in
One embodiment of this approach may be implemented as follows. Assuming that bk belongs to data stream 1, the K best x1 values may be determined such that the conditional probability of x1 conditioned on y is maximized (i.e. max Pr(x1/y)).
One approach to generating the conditional probability is to use a Gaussian approximation of Pr(x1/y). For example, the conditional probability Pr(x1 /y) may be determined as:
Information related to the probability of x2 may be known from the turbo decoder. In equation (16) above, the summation term can be approximated by an integral as:
x∫Pr(y|x1,x2)f(x2)dx2 (17)
This integral may use a Gaussian or a non-Gaussian continuous function approximation for the pmf of x2 (e.g., f(x2)). For example, a Gaussian approximation closed form solution be used as follows:
∝exp(−(y−h2μ2−h1x1)HR−1(y−h2μ2−h1x1)) (18)
Alternately, a non-Gaussian function approximation, such as described previously, may also be used for f(x2).
By using this approach, a set of values may be selected to maximize the sum (e.g., choose the K-Best x1 to maximize x1 conditioned on y (e.g., maximize equation 18)). For each x1, the best x2 can be found, resulting in K-best pairs of x1,x2. These K-best pairs may then be used in the summation to generate the LLR metric.
For example, after the K-Best list of x1, x2 values (or, in the case of systems with additional antennas, the best values over the total of the received signals conditioned on y) are be found (i.e., by taking into account the a priori information such as described above), equation (19) below may be evaluated over only the list values (e.g., rather than performing the summation over all possible values, only the a priori determined K-Best list values are included in the summation):
Since equation (18) is a quadratic form, processing as described subsequently may be used to simplify the calculation. In general, in order to determine the a priori K-Best values, the goal is to determine the pairs x1, x2 corresponding to the a priority K-Best values, (e.g., the best x2 value for each determined x1 value). A direct approach to computing the a priori K-Best values would be to evaluate the equation for each x1 to find the maximum valued x1, x2 pair to generate the list. However, this creates complexity as the constellation size increases because each x1 value must be evaluated (e.g., for a 256 QAM constellation, 256 values of x1 would need to be evaluated).
In one approach, once the x1 values are found, the best x2 values may then be found as shown in equation (19), where, for each x1(k) from equation (18), a Hard-SIC hypothesis may be calculated as:
g
+(x1(k),x2)=exp(−∥y−h1x1(k)−h2x2∥2)Pr(x1(k),x2) (20)
In this case, the probability of x1(k), x2 may be approximated by a continuous function, such as a Gaussian function as described previously. In this case, it becomes quadratic in x2, thereby allowing efficient evaluation, such as described subsequently. Using this approach, the LLR metric may then be determined as:
A closed form solution for equation 20 can be expressed as a second order polynomial in x2, and the exponential form is also quadratic. Similarly, for the non-Gaussian approximation, a second order polynomial form can be used (such as described previously herein).
Using this approach, the values can be readily identified by finding the minimum value on the polynomial curve. An example of this is illustrated in graph 1200 of
Ax12+2Bx1+C (22)
Polynomial function 1210 may correspond to either a Gaussian or non-Gaussian approximation, such as described previously. The a priori K-Best values may be obtained by determining the minimum value of the polynomial function and searching for the nearest symbol values. For example, in one search strategy list values are searched in a zigzag fashion from the minimum function value −B/A.
For example, in the example of
As described previously with respect to equations 16-18, 20, and 21, a summation embodiment where the a priori K-best x1 values are identified such that Pr(x1.y) is maximized is described. This approach may be denoted as the a priori K-Best Sum approach.
In another embodiment, denoted as an a priori K-best Max approach, a priori K-Best x1 values may be determined so that the maxx2Pr(x1,x2/y) is maximized. In this approach, a continuous function approximation may be used, such as using a Gaussian or non-Gaussian function. If x2 is approximated as Gaussian, such as described previously, the polynomial search method as described with respect to
As shown in equation (18), a matrix inversion is required to evaluate the exponential function (i.e., R−1). In general, evaluating this inversion is complex. In accordance with another aspect, the matrix inversion may be simplified by reducing the dimension of the matrix. One embodiment of this approach is described in the subsequent exemplary embodiment section. In addition, details of an exemplary embodiment of processing for performing channel inversion are also described.
The disclosure that follows provides various additional details, features, and functions associated with embodiments for LTE OFDM implementations. These details are provided for purposes of further explanation of various aspects, and are not in any way intended to be limiting.
An example MIMO OFDM system with M transmit and N receive antennas may be considered as follows. The example system has Ns subcarriers in an OFDM block. There are M data streams to be transmitted. The constellation m is applied on stream m, where Cm is the number of bits per constellation symbol. The incoming bits of each stream m of length NsCmRm,m=1, . . . ,M, is encoded using a channel code (typically a convolutional or turbo code) of rate Rm, resulting in a bit vector bm. The encoded bits are converted into symbols using a mapping function xi,m=Mm(bm((i−1)Cm+1:iCm)) (e.g., Gray mapping and set partitioning mapping), i=0, . . . ,Ns−1, where xi,m is the symbol to be transmitted over subcarrier i and antenna m (using Matlab notation). The Inverse Discrete Fourier Transform (IDFT) of the data block x0,m, . . . ,xNs−1,m yields the time domain sequence, i.e.,
The time domain symbol Xj,m is assumed to obey the component-wise energy constraint E{2Xj,m22}=Es/M. A cyclic prefix (CP) is added to mitigate for the residual ISI due to previous OFDM symbol. After parallel-to-serial (P/S) conversion, the signal is transmitted from the corresponding antenna. The channel between each transmitter/receiver pair is modeled as multipath channel. The channel between transmit antenna m and receive antenna n is expressed as
where Γn,m is the number of taps, αn,m,l is the lth complex path gain, and τn,m,l is the corresponding path delay. A block fading model may be assumed, where the channel is assumed to be constant in each OFDM data block.
At the receiver side, serial-to-parallel (S/P) conversion is first performed and the CP is removed. After DFT operation, the received signal in frequency domain can be expressed as
where n denotes the receiver antenna indexing, wi,n is the additive white Gaussian noise (AWGN) with zero mean and variance σ2, and
where Ts is the symbol duration. (A4) can be written in vector form as
y
i
=H
i
x
i+w
i
,i=0, . . . , Ns−1. (A5)
The MIMO-OFDM system may be modeled as shown in
Relationships between existing iterative decoding and detection algorithms are described below. The channel code and the MIMO channel can be considered as a serially concatenated scheme with an outer channel encoder and inner constellation mapping with block encoding matrix Hi at each subcarrier. To decode b1, . . . ,bM, the optimal joint detector and decoder should compute the likelihood of each bit given all the received signals y0, . . . ,yNs−1 on all subcarriers. However, this is generally computationally impractical in practice. Several algorithms, such as those described previously in the papers cited, solve this problem approximately using the “turbo principle”, where information is exchanged between the detector (inner mapping) and decoder (outer encoder) in an iterative fashion until desired performance is attained.
Extrinsic information at each subcarrier may use the received signals on this subcarrier using the a priori information on each bits from the channel decoder. The generated extrinsic information on all subcarriers may then put into the soft in and soft out channel decoder (e.g., Bahl-Cocke-Jelinek-Raviv (BJCR) algorithm) for the next iteration decoding and detection.
Different joint detection and decoding algorithms share the same outer channel decoder. Their difference lies in how the extrinsic information from the inner mapping is generated and used. An iterative decoding and demodulation for a MIMO-OFDM consistent with the configuration shown in
The a priori probability (APP) is usually expressed as a log-likelihood ratio (LLR) value, whose magnitude indicates the reliability of the decision. In the examples described subsequently, the logical zero for a bit is represented by amplitude level bi=−1 and logical one by bi=+1, respectively.
After obtaining the APP from the channel decoder (initially the APP is set to be zero), the a posteriori LLR value of the bit bi, i=0, . . . , Σm=1MCm−1 conditioned on the received vector y is
If it is assumed that the interleaver at the encoder is ideal such that the bits in each modulation symbol are approximately statistically independent of one another, (A6) can be rewritten using Bayes' theorem as
where Xi,+1 and Xi,−1 are the set of 2 Σm=1MCm−1 symbols vectors such that the i-th bit is +1 or −1, respectively, i.e., Xi,±1={x|M(b)=x, bi=±1}, b=B(x) is the inverse mapping of x=M(b) and Bj(x) is the j-th bit of B(x).
In case of Gaussian channel as in (A5), L(bi|y) can further be written as:
Using the definition of LA(bi), (A8) can be rewritten as:
Instead of using (A9) directly, max-log approximation is adopted to compute LE(bi|y) as:
However, the simplification in equation (A10) still has a complexity exponential in the total number of bits or Σm=1MCm−1. A list sphere decoder (LSD) is used to resolve this issue by searching only over a list containing Ncand elements, i.e.,
The list is generated by checking only points within the hypersphere of radius r, i.e.,
∥y−Hx∥
2
≦r
2. (A12)
The list choose Ncand points within the hypersphere that make ∥2y−Hx∥ smallest. The radius r is chosen according to the noise variance such that the number of points within the hypersphere is not far away from Ncand. The performance of the LSD based algorithm depends on the size of the list. When the list size is equal to the number of all possible constellation points, i.e., 2Σm=1MCm−1, (A11) reduces to (A10).
In one implementation, iterative detection using a Gaussian approximation may be used. As described previously, the complexity of directly computing the LLR value from (A7) is high. (A7) can be written as:
where xm denotes the symbol that bi belongs to, i.e., Σm′=1m−1Cm′≦i<Σm′=1mCm′·x−m denotes the vector contains all entries of x except the m-th entry, and Xi,+1m and Xi,−1m are the set of 2Cm−1 symbols such that bi is +1 or −1, respectively. From (A13), Σx−mPr(y|x−m,xm)Pr(x−m) is computed for any given xm.
A suboptimal approach is to replace the summation over x−m with an integration over a continuous distribution, such as described previously herein. One typical assumption is to use the Gaussian distribution. Assuming the entries of x−m are independent Gaussian random variables with mean:
and variance:
m′=1, . . . ,M, m′□m. When Gaussian channel model (5) is used:
where the integral is from −∞ to ∞ in each dimension, H−m contains the columns of H except the m-th column, hm is the m-th column of H, μ−m=[μ1, . . . ,μm−1, μm+1, . . . ,μM]T,
R
m
H
−mdiag{v12, . . . ,vm−12,vm+12, . . . ,vM2}H−mH+σ2IN, (A17)
and IN is an N by N identity matrix. Substituting (A16) into (A13), the LLR value under a Gaussian approximation is obtained. The complexity of computing LLR reduces from 2Σm=1M Cm to 2C
A probabilistic data association (PDA) method may be applied to uncoded MIMO systems. The concept can be extended to coded systems to compute Pr(y|xm). In PDA, H−mx−m+w is assumed to be Gaussian with matched mean H−mμ−m and covariance Rm in (A17). The PDA method obtains Pr(y|xm) as in (A16).
In iterative multiuser detection, a soft MMSE interference cancellation scheme may be used. Translating this scheme to the MIMO case, when computing the LLR, the soft estimates of all the symbols except xm using (A14) may be used to soft cancel the interference in y to obtain:
y
m
=y−H
−mμ−m. (A18)
If ym is used directly and assuming the interference in ym is Gaussian, it can be shown that Pr(y|xm) is given by (A16). Instead of using ym to generate the LLR directly, a linear MMSE filter um is applied to ym to obtain
{circumflex over (x)}m=umHym. (A19)
where um is chosen to minimize the mean-squared error between {circumflex over (x)}m and xm, i.e.,
Using standard LMMSE estimation theory:
u
m
={circumflex over (R)}
m
−1
h
m. (A21)
where
{circumflex over (R)}
m
=H
−mdiag{v12, . . . ,vm−12,vm+12, . . . ,vM2}H−mH+hmhmH+σ2IN=Rm+hmhmH. (A22)
Substituting (A19) into (A18):
If the interference ηm in (A23) is approximated as Gaussian:
Pr(y|xm)=Pr({circumflex over (x)}m|xm)∝exp(−({circumflex over (x)}m−umHhmxm)H(umHRmum)−1({circumflex over (x)}m−umHhmxm))=exp(−(y−H−mμ−m−hmxm)Hum(umHRmum)−1umH(y−H−mμ−m−hmxm)). (A24)
Even though (A24) may appear to be different from (A16), it can be shown that (A24) is proportional to (A16), which means that LLR values computed by using (A16) and (A24) are equal. They are not equivalent in general unless um is invertible (e.g., um is an M by M matrix). This is an interesting phenomenon as from the data processing lemma, the mutual information between ym and xm is greater than or equal to that between {circumflex over (x)}m and xm.
For Gaussian signals, the linear MMSE filter does not change the mutual information. By making Gaussian assumption on the transmitted signals, it can be seen that LMMSE also preserves the LLR value even though (A16) works on a N×1 vector y and (A24) only uses a scaler {circumflex over (x)}m. The approximation (A24) has a complexity advantage over (A16). In (A24), there are only two vector multiplications to obtain xm and umHhm and computing Pr({circumflex over (x)}m|xm) only involves scalar operations thereafter. On the other hand, in (A16), a vector operation is needed for each xm.
In MMSE equalization, the MMSE equalizer applies an affine filter on the received signal y directly (different from where the LMMSE filter is applied after canceling soft symbol estimates), i.e.,
{circumflex over (x)}
m
=a
m
H
y+c
m. (A25)
where
a
mCov(y,y)−1Cov(y,xm)={circumflex over (R)}m−1hm,
c
m
=E{x
m
}−a
m
H
E{y}=−a
m
H
H
−mμ−m. (A26)
It can be noted that E{xm} depends on LA(bi′), ∀i′ such that Σm′=1m−1Cm′≦i′<Σm′=1mCm′. As LA(bi′) is used after computing Pr(y|xx) in (A13), a priori information should not be used on xm, i.e., xm is assumed to be uniformly distributed over or m or E{xm}=0. Comparing (A26) with (A21), it can be seen that um=am and {circumflex over (x)}m in (A25) is equal to that in (A19). Therefore, the MMSE equalizer may be equivalent, in some cases, to the soft MMSE interference cancellation.
Consequently, the processing algorithms can be considered as using a Gaussian approximation as in (A13). As such, they reduce the complexity of the LLR value, which is at the expense of performance degradation.
There are various problems associated with the existing processing algorithms as described above. For many practical wireless communications standards such as LTE, high order constellations such as 64QAM or 256 QAM have been adopted. The max-log approximation in (A10) may not work well with high order constellations as the number of terms in the summation in (A7) is large. Moreover, the LSD may be hard to implement in hardware directly due to its sequential nature.
The Gaussian approximation based algorithms avoid the max-log approximation, but the Gaussian assumption incurs some performance loss. It is noted that the performance of PDA or Gaussian approximation algorithms may not be good for higher order modulations.
To address these concerns, as well as provide other potential advantages and/or improvements, a class of non-Gaussian approximations may be used for LLR metric computation. As practical constellations have a finite alphabet structure, the non-Gaussian distribution may be integrated over a bounded set instead of from −∞ to +∞.
In another aspect, a combination of K-best algorithm and the non-Gaussian approximation may be used. In the K-best algorithm, K branches may be kept at each decoding stage and the branches may be pruned using the non-Gaussian approximation. Instead of using only the maximum of the K remaining metrics as in (A10), the sum of all the K metrics may be used to compute the LLR. The resulting algorithm may be readily parallelized in hardware.
In the subsequent embodiment examples, it is assumed that squared-QAM is used at all transmit antennas, which is the case in many wireless communications standards. However, the proposed processing method and algorithm can be readily extended to other general constellations.
With squared-QAM, (A5) can be written as a real system, i.e.,
where Y(x) and T(x) denote the real part and imaginary part of x, respectively and the entries of {tilde over (x)}i are from PAM constellations. With a slight abuse of notations, (A5) may still be used to represent the real system (A27) as follows, with the entries of xi from PAM.
As described previously herein, to motivate a non-Gaussian approximation, we may start with a BPSK constellation, i.e., X 0{+1, −1}. Let Pr(X=+1)=p and Pr(X=−1)=1−p. This probability mass function (puff) may be written as a single equation as:
A continuous approximation to this pmf may be generated by relaxing x to be a real number with a scaling factor to keep ∫Pr(X=x)dx=1. It is noted that there are several choices of the pmf (A28). For example, we can choose
However, this function will go to ∞ when x goes to ∞, which is undesired.
may also be chosen, however, this function is hard to obtain in a closed form form integration.
This approach may be extended to higher order modulations. In general, for a given modulation with Pr(X=xi)=pi and 3pi=1, the pmf may be written in a single equation as:
A pdf approximation may be obtained by relaxing x to be a real number. When a |>2, if (A29) is used directly in (A16), the integral involves a polynomial greater than second order in the exponential function, whose closed form may be difficult to obtain. Therefore, the pmf (A29) may be approximated with a second order polynomial in the exponential function for any i.e.,
Pr(X=x)=exp(−(c+2rx+ax2)). (A30)
It is noted that the Gaussian distribution is a special case of (A30), which contains only two variables. The coefficients a, r, c may be found by solving:
where ωi≦0 is a weight for symbol xi. In practical systems, only symbols with the largest probability may be considered. In this case, we may choose ωi=1 for the three largest probability symbols and ωi=0 otherwise. The solution of (A32) can be readily obtained by a least squares approach.
However, as noted previously, a Gaussian approximation may not be good for some pmfs, and the integration in (A16) is from −∞ to +∞, which may distort the LLR value. It is noted that practical constellations typically are usually finite alphabets, e.g., 2D-PAM is {−2D+1,−2D+3, . . . ,2D−3,2D−1}. The integration range may be bounded, for example by integrating from −U to U instead. Possible choices of U include 2D or 2D−1+σ.
When U=2D, Pr (X=d) may be approximated by the integral between d−1 and d+1. When U=2D−1+σ, Pr(X=d) may be approximated similarly as when U=2D but taking into account of the noise variance at the two boundary points. With (A30) and the finite integration, (A16) can be written as:
where r−m=[r1, . . . ,rm−1,rm+1, . . . ,rM]T and A−m=diag{a1, . . . ,am−1,am+1, . . . ,aM}, rm′, and am′ may be obtained from (A31) and (A32). Comparing with (A16), it may be noted that there are two main differences. First, r−m and A−m are not from the matched mean and variance but from matching the pmf directly. Second, the integral is from −U to U.
To compute the integral in (A33), we may let the singular value decomposition of Rm be VT ΛV and g(xm)=Vb−m, where Λ=diag{λ1, . . . ,λM−1}, making a change of variables by defining z=Vx−m. However, the integration region of z′ is a M−1 dimensional polytope, which makes the integral difficult to compute. For simplicity, the integration region may be enlarged by setting a bound Zi=UΣj=1M−1|Vi,j| for dimension i. (A33) may then be upper bounded as:
It is noted that the second product in (A34) also depends on xm. In some cases, λi may be negative. Therefore, the integral cannot be written into Q-function.
To illustrate the difference between Gaussian approximation and non-Gaussian approximation (as was similarly described previously herein), an example may be considered. Using, for the example a 4-PAM constellation where ={−3,−1,1,3}. Two bits (b1,b2) may be mapped to via gray mapping (0,1)→−3,(0,0)→−1,(1,0)→1,(1,1)→3. Let Pr(b1=1)=0.6 and Pr(b2=1)=0.8. This results in Pr(X=−3)=0.32, Pr(X=−1)=0.08, Pr(X=1)=0.12, and Pr(X=3)=0.48. The pdf of non-Gaussian and Gaussian approximations may be compared, such as described previously herein, where the parameters in non-Gaussian are obtained using (A32).
The areas between 2i and 2i−2, i=−1,0,1,2 are 0.3130, 0.0906, 0.1049, 0.4915 for the non-Gaussian approximation and 0.1480, 0.2909, 0.3348, 0.2263 for the Gaussian approximation. It is apparent that the Gaussian approximation does not match the discrete distribution when some bits are not reliable. This problem is especially severe when each symbol contains more than 2 bits. This may be a reason why the performance of PDA is not good for high order modulations. Note that in this case a<0 in (A30).
For general bit mappings, due to the constraint on the polynomial order, (A30) may not fit the pmf for all symbols in large constellations. It is noted that Gray mapping does not perform well in iterative joint detection and decoding. Other mappings such as set partitioning mapping have better performance. For set partitioning mapping, the polynomial order constraint may be resolved via constellation decomposition. For example, let bi=+1 for logical one and bi=−1 for logical zero. The set partitioning mapping for 2C-PAM (={−2C+1,−2C+3, . . . ,2C−3,2C−1}) may be written as:
where d=[1,2, . . . ,2C−1]T and b=[b0, . . . ,bC−1]T. It is noted that the scaling factor in the modulation has been ignored to keep unit average power. As each entry of b takes BPSK, the continuous approximation to the pmf is given in (A28).
We may define {tilde over (H)}−{dot over (m)}=H−mdiag{d1T, . . . ,dm−1T,dm+1T, . . . ,dMT} and b−m=[b1T, . . . ,bm−1T,bm+1T, . . . ,bMT]T. By replacing H−m with {tilde over (H)}−m and x−m with b−m in (A33), a similar form as (A34) may be obtained. The only difference is that the new eigenvalue {tilde over (λ)}i is nonnegative. Therefore, (A34) may be rewritten as:
where Ā−m and {tilde over (r)}−m are defined similarly as {tilde over (H)}−m. This approach may be extended to other similar bit mappings resulting in constellation partitioning.
In various embodiments, an a priori K-best processing implementation may be used in computing an LLR value to provide potential performance and/or efficiency advantages. It is noted that the LSD only considers the maximum term among all the 2Σm=1MCM−1 terms in (A7), and the list is generated by using Pr(y|x1, . . . ,xM) only without using the a priori information Pr(xm′), m′=1, . . . ,M. Moreover, when the LSD comes to the i-th data stream, it only checks the symbols satisfying:
where the QR decomposition of H is H=QR, Rij is the (i,j)-th entry of R, {tilde over (y)}=QHy and {tilde over (x)}j is the trial value of xj. Using (A37) does not consider the effect of choosing xi on the data streams 1, . . . ,i−1. On the other hand, the Gaussian approximation algorithm described previously considers the summation in (A7) but the Gaussian approximation is not good for high order constellations.
Accordingly, a processing implementation using both approached may be used. In particular, a Gaussian approximation and/or a non-Gaussian approximation may be used as a metric to guide the K-best list search, taking into account the effects of stream i on streams 1, . . . ,i−1.
As with an LSD implementation, it may also be desirable to find a list of K lattice points. However, distinct from LSD, it may be desirable to try to find a list Li,±1, containing K points for each bi=±1. The LLR value of the bit bi in (A7) may then be approximated as:
Another difference from the LSD approach may be using a sum-log approach rather than a max-log approach. Yet another difference relates to how the list is generated. For example, it may be desirable to find K lattice points x∈Xi,±1 such that Pr(x|y) is maximized, rather than Pr(y|x) is maximized, where the a priori information is exploit in the former case.
There are several ways to generate the list using modified K-best algorithm—these may be denoted as {tilde over (x)}m∈Xi,±1m sum-algorithm and max-algorithm. In the sum-algorithm approach, at the initial step, Pr({tilde over (x)}m|y) assuming that bi belongs to data stream m, we may first cheek each to find the K candidates such that Pr(xm|y) is maximized and add m into a set V. This can be written as:
Direct computation of (A39) requires summation, which may be 2Σm=1MCm−1 computationally prohibitive. As described previously, the summation in (A39) may be replaced by an integral as:
Pr({tilde over (x)}m|y)∝∫Pr(y|x−m,{tilde over (x)}m)f(x−m)dx−m. (A40)
Where f(x−m) is the matched pdf of x−m, which could be either Gaussian or non-Gaussian. For example, with a Gaussian approximation:
Pr({tilde over (x)}m|y)∝exp(−(y−H−mμ−m−hm{tilde over (x)}m)HRm−1(y−H−mμ−m−hm{tilde over (x)}m)) (A41)
where μ−m and Rm K {tilde over (x)}m are defined Pr({tilde over (x)}m|y) in (A41). The largest may be added into a list , which may be initialized to be Ø.
The processing may then go to xj, j≠m x1,x2, . . . ,xM. Before it reaches the end, we may have V={m,1, . . . ,j−1} and the list contains K candidates, each of which has the form xv=[xm,x1, . . . ,xj−1]T.
For each xv∈, we may then compute Pr(xv,{tilde over (x)}j|y) for each {tilde over (x)}j∈j. Among the resulting K|j| [xvT,{tilde over (x)}j]T, we may only choose K of them such that Pr(xv,{tilde over (x)}j|y) is maximized, update the list with the K chosen vectors, and add j into V. Pr(xv,{tilde over (x)}j|y) may be approximated in the same manner as in equation (A40). In the case of use of Gaussian approximation, we have:
Pr(xv,{tilde over (x)}j|y)∝exp(−(y−H−{v j}μ−{v j}−Hvxv−hj{tilde over (x)}j)HR{v j}−1(y−H−{v j}μ−{v j}−Hvxv−hj{tilde over (x)}j)). (A42)
where μ−A constitutes the entries of μ that are not in A, H−A is consisted of the columns of H that are not in A and
R
{v,j}
=H
−{v,j}diag{v−{v,j}2}H−{v,j}H+σ2IN. (A43)
The processing then ends when j=M.
In another implementation using the max processing algorithm, where Pr(xv,{tilde over (x)}j|y) is maximized consecutively, Pr({tilde over (x)}|y) may be maximized directly. At the first step, for each {tilde over (x)}m∈,Xm±1m, the corresponding {tilde over (x)}−m may be found such that:
where X−m includes all possible lattice points. K {tilde over (x)}m may be put into the list such that Pr({tilde over (x)}m,{tilde over (x)}−m|y) is largest and add m into a set V. As solving (A44) has a high computation complexity, Pr({tilde over (x)}m,x−m) may be replaced with a continuous Gaussian or non-Gaussian approximation, and the discrete set X−m into a continuous set C−m.
When C−m is bounded, the boundary on xj is defined by the largest and smallest elements in . For example, when ={−3,−1,1,3}, −3≦xj≦3 may be chosen. When the non-Gaussian approximation in (A30) is used, (A45) needs to be solved:
As (A45) is quadratic in x−m, when the objective function of (A45) is convex, {acute over (x)}−m may be found using convex optimization methods. If not, a local minimum around the following may be found:
{tilde over (x)}−m={circumflex over (x)}−m or map {grave over (x)}−m may be set to the closest lattice point in X−m. Comparing with (A37), (A45) uses the a priori information through r−m and A−m and its counts the effect of symbol {tilde over (x)}m on Pr({tilde over (x)}m,{tilde over (x)}−m|y).
The process may then go to x1,x2, . . . ,xM, before it reaches xj, j≠m. V={m, 1, . . . ,j−1} and the list contains K candidates, each of which has the form xv=[xm,x1, . . . ,xj−1]T. For each {tilde over (x)}v∈ and each {tilde over (x)}j∈j, we may find the corresponding {tilde over (x)}−{v,j} such that:
Among the resulting K|j| [{tilde over (x)}vT,{tilde over (x)}j]T, we may only choose K of them such that Pr({tilde over (x)}v,{tilde over (x)}j,{tilde over (x)}−{v,j}|y) is maximized, update the list with the K chosen vectors and add j into V.
As in (A45), {tilde over (x)}−m may be approximated by solving:
where the notations are similar to those in (A42) and (A45).
It is noted that the difference between the sum-algorithm and the max-algorithm lies in the fact that the effects of x−{v,j} are removed from Pr({tilde over (x)}v,{tilde over (x)}j,x−{v,j}|y) by summing over all possible x−{v,j} in the former case while we take the max x−{v,j} maximizing this probability in the latter case. When C−m is unbounded and Gaussian approximation is used, it is can be seen that solving (A44) is equivalent to solving:
where Λ−m=diag{v12, . . . ,vm−12,vm+12, . . . ,vM2}.
The basic algorithms can also be extended in various ways. Some examples of these variations are described below.
Common List Algorithm: Using the two basic list algorithms, two lists (one for +1 and the other for −1) for each bit's LLR computation need to be found. When the total number of bits is large, this may incur a high computational complexity. To reduce the complexity, the same list may be used for all bits' LLR computation. The list may be generated by choosing the K lattice points such that Pr(x|y) is maximized. Both the sum-algorithm and the max-algorithm can be used for this purpose. Different from the basic algorithms which start from xm, we may start from x1 to x2, . . . in the common list algorithm, where xj is from j ∀j={1, . . . ,M}. Finally, the LLR value of the bit bi is then approximated as:
When x∈Xi±1∩=Ø, the LSD in [4] proposes using a predetermined saturated LLR value ±B, e.g., B=8. We propose using Σx
where Ci,±1 is the real relaxation of Xi,±1.
Parallel Algorithm: In the basic algorithms, the list is generated by visiting xm,x1, . . . ,xM, sequentially. We can also generate the list in parallel by generating a list i for each xi, where i is generated by choosing the best Ki elements in Qi to maximize Pr(xi|y). In this case, the list is given by =1×2× . . . ×M, which is of size K=Πi=1MKi. Using this approach, different lists i can be generated in parallel, which is suitable for hardware implementation.
Bit-wise Algorithm: The basic algorithms proceed from symbol to symbol. However, both algorithms can also run on bits. For example, when set partitioning mapping is used, as shown in equation (A35), the 2C-PAM can be written as a weighted sum of bits. Both algorithms can work on bits by replacing x in both algorithms with b using (A35).
Bit-wise algorithms can also be derived for arbitrary mappings. The sum-algorithm may be considered as an example. To compute L(bi|y), we may start with bi and compute Pr(bi=±1|y)=Σx∈xi,−1Pr(x|y). In (A40), every xj except xm may be replaced with a Gaussian or non-Gaussian continuous variable and Pr(bi=±1|y) may be computed by summing over all possible xm in Xi,±1m, xm may also be approximated as a continuous variable. For example, when xm is assumed to be Gaussian, the matched mean and variance may be determined as:
When the non-Gaussian distribution is used, the distribution may be obtained by fitting the distribution over the symbols in Xi,±1m only. The probability Pr(bi|y) may be obtained as in (A41). When the algorithm reaches bit bj and its corresponding symbol is xm′, where symbols xm′+1, . . . ,xm−1,xm+1, . . . ,xM have not been visited. For example, let bj=[b1, . . . ,bj,bi]T. For any {tilde over (b)}j from the list , we can compute the matched mean and variance for xm′ as
where Xbj,{tilde over (b)}{tilde over (j)}m′, is the set of constellation points for xm′ such that the corresponding bits in bj is equal to {tilde over (b)}j. The rest of the algorithm may be implemented in a fashion the same as or similar to that of the symbol based algorithm.
A potential advantage of the bit-wise algorithm is that some symbols may be pruned early when the first few bits of the corresponding symbols are not chosen in the list with K elements.
Early Stopping and Varied K: As described above, the basic algorithms generally stop after reaching xm. However, we may stop the algorithm at any xj. In this case, the LLR value may be determined as:
may then be approximated by using a Gaussian or non-Gaussian approximation. The stopping level gives a tradeoff between performance and complexity. Early stopping may also be used when some symbols are not reliable, e.g., every symbol in the constellation has roughly the same probability. In this case, different candidates may have roughly the same metric. Choosing the best K candidates may not be good. The symbols may be reordered such that the unreliable symbols correspond to the last few symbols, and early stopping may be used when the algorithm reaches the unreliable symbols
The list size K may also be varied for different symbols. The list size Kj may be chosen as Kj after symbol xj is visited. For example, Kj can be chosen to be a large value for the first few visited symbols as the choice of these symbols is important to the overall performance, and Kj is chosen to be a small value when the algorithm is close to the end to save complexity.
In practical protocols there always exists some CRC check bits. When a particular data stream passes the CRC check, this data stream does not need to be included in the future iterative demodulation and decoding. For example, this data stream may be cancelled directly or using hard SIC.
Some aspects of the disclosure related to complexity reduction, such as described previously herein with respect to matrix inversion. As noted previously, direct computation of (A42) or equation (4) requires matrix inversion and matrix multiplication for every {tilde over (x)}j∈j. From the expression of R{v,j} in (43) and the matrix inversion lemma, we have:
where gj=Rv−1hj. Initially, we need to compute (Hdiag{v2}HH+σ2IN)−1, which has a complexity O(N2.376+NM2). Substituting (A56) into (A42), we obtain:
Computing hjHgj and gjHy need 2(N−1) additions and 2N multiplications. yHRv−1y and y−H−vμ−v are inherited from the previous step. Hvxv is updated and stored in the list and the update needs KN multiplications and KN additions. Computing y−H−vμ−v+hjμj−Hvxv needs N multiplications and 2N additions.
The total number of additions to compute the coefficients A,B,C for all the elements in the list is 3(K+1)N+K−2 and the total number of multiplications is (2K+3)N+5. As equation (A57) is a scalar function in {tilde over (x)}j, we can search over j for each xv to find the K candidates with the maximum (A42). This algorithm requires 2K|j| multiplications and 2K|j| additions, thereby reducing computation complexity.
Another implementation tries to find κ {tilde over (x)}j that maximizes (A57) for each xv, e.g., κ=4. The list may be updated from the resulting κK candidates. To find the best κ for (A57), properties of the second order polynomial may be used as follows.
Let l be the index of a constellation point in that is closest to B/A. If B/A>j(l), the best κ {tilde over (x)}j are simply j(l),j(l+1),j(l−1),j(l+2), . . . If B/A<j(l), the best κ {tilde over (x)}j are j(l),j(l−1),j(l+1),j(l−2), . . . The total complexity of the basic sum-algorithm is O(N2.376+Σm=1MCmK(MN+κ)+NM2). When a common list is used, the complexity becomes O(N2.376+K(MN+κ)+NM2).
Some aspects of the disclosure relate to channel inversion as may be applied in MIMO-OFDM. As described previously, the complexity of computing (Hdiag{v2}HH+σ2IN)−1 consists of a large portion of the total complexity. In MIMO-OFDM, different subcarrier have different channels H, which generally need to be computed for each subcarrier. Moreover, each iteration between the demodulator and the decoder gives a new v2 and this matrix inversion needs to be computed for each iteration.
To reduce the complexity of matrix inversion computation, v2 may be replaced with a 0-1 vector ξ, when vj2 is greater than a threshold (e.g., 0.5) choose ξj=1, and ξj=0 otherwise. When vj2 is large the symbol is not reliable and we may assume the symbol is uniformly distributed which results in ξj=1. On the other hand, when vj2 is small, the symbol is reliable and we may use hard decision on this symbol resulting in ξj=0. Therefore, from (A56), (HiHiH+σ2IN)−1 only needs to be computed at the i-th subcarrier.
In MIMO-OFDM systems, adjacent subcarriers have similar Hi and hence similar (HiHiH+σ2IN)−1. This correlation may be used to reduce the complexity of computing the matrix inversion. For example, when the channels between each transmit and receive antenna are flat fading, all Hi are identical and the matrix inversion need be computed only once, reducing the complexity by a factor of Ns.
For example, let γ=maxm,n,l┌τn,m,l/Ts┘ in (A4). Each entry of Ξi=HiHiH+σ2INis a polynomial in
of order at most 2γ. The inverse of Ξi is
where adj (Ξi) is the adjugate of Ξi, the matrix formed by the cofactors of Ξi. From the definition of adjugate matrix and determinant, each entry of adj (Ξi) and det (Ξi) are polynomials in
of order at most 2γ(N−1) and 2γN, respectively. If adj(Ξi) and det(Ξi) are computed on subcarriers in and ||≧2γN, we can determine the coefficients of these polynomials and Ξi−1 at subcarriers not in can be obtained by substituting
into the polynomials corresponding to adj(Ξi) and det(Ξi), which is a form of interpolation.
However, the complexity of computing adj(Ξi) is O(2γN3), which is greater than the complexity of computing Ξi−1 directly, i.e., 0(N2.376). Therefore, linear interpolation may be used instead for adj(Ξi) and det(Ξi). For example, subcarriers in may be chosen such that the index difference of adjacent subcarriers is D=2l. For any two adjacent subcarriers i,j∈ and j−i=D, we may first compute Δ=(adj(Ξj)−adj(Ξi))/D and 67 =(det (Ξj)−det (Ξi))/D, which can be computed efficiently using bit shifting. For any subcarrier i<k<j, we have:
which only needs N2 additions and multiplications, respectively. The parameter D gives a tradeoff between performance and complexity.
Some aspects of the disclosure relate to column reordering. Column reordering of the channel matrix H has been recognized to be important to the performance of uncoded MIMO systems. If optimal joint MAP detection and decoding is used as shown in, for example, (A7), column reordering does not help. When the successive interference cancelation based algorithms described previously are considered, the order of processing different data streams may affect the computation of LLR value. Moreover, in practical systems, there may be only a single channel decoder.
Consequently, decoding of data streams may be done sequentially, and after a data stream is decoded, its update a priori information may be used for decoding of the remaining streams. This approach is different from other the algorithms where the updated a priori information is used only for the next iteration decoding not for the current one. In this case, different channel matrix reordering may lead to different convergence rate and performance.
Also different from techniques where only a single channel matrix is considered, the bits of each data stream in MIMO-OFDM span over several subcarriers. In addition, the whole data stream needs to be decoded before processing the next data stream. Therefore, the channel matrices on all subcarriers should prefereably be reordered in the same way. The difficulty is in taking all subcarriers' channels into account. To address this, a data stream with good channel conditions should preferably be decoded first such that the probability of successful decoding is high, and other data streams can benefit from this.
Two possible reordering schemes are described below. First, the reordering may be performed according the average SNR across the subcarriers, i.e.,
where hi,j is the j-th column of Hi and Hi,−j constitutes the columns of Hi except column j. The one with the largest (A60) may be decoded first, denoted as m. m may then be added into a set S which is initialized to be Ø. In the next step, it may be assumed that the stream m can be perfectly canceled, and find the next decoded stream according to:
The data stream with the largest (61) not in S may be decoded and this data stream added in S. The process may then continue until all the data streams have been added into S.
In another implementation, capacity may be averaged to reorder the channel matrix rather than the average SNR. This may be done by replacing (A61) with
A Gaussian channel capacity formula is used in (62). However, capacity formulas for finite constellations may also be used. The computation of (A61) may be done iteratively as in (A56) and (A57). To save complexity, we can also simply reorder the data streams only based on (A60) without stream cancelation. In this case, we only need compute the SNR or capacity K times rather than K(K+1)/2 times.
The ordering can also be improved using the a priori information. When we consider the mutual information between yi and xi in (A5), for data stream j, (A5) can be written as:
y
i
=H
iμi+hi,j{tilde over (x)}i,j+Hi,−j{tilde over (x)}i,−j+wi, (A63)
where {tilde over (x)}i,j is assumed to be Gaussian with mean zero and variance vi,j2. As a constant does not change mutual information, we have:
1(yi;xi)=log(1+hi,jH(Hi,−jΛi,−jHi,−jH+σ2IN)−1hi,j). (A64)
where Λi,−j=diag{vi,12, . . . ,vi,j−12,vi,j+12, . . . ,vi,M2}, can be used instead of (A62) to reorder the data streams.
In some configurations, the apparatus for wireless communication includes means for performing various functions as described herein. In one aspect, the aforementioned means may be a processor or processors and associated memory in which embodiments reside, and which are configured to perform the functions recited by the aforementioned means. The aforementioned means may be, for example, modules or apparatus residing in UEs, eNBs, and/or other wireless network nodes, to perform the functions as are described herein. In another aspect, the aforementioned means may be a module or apparatus configured to perform the functions recited by the aforementioned means.
In one or more exemplary embodiments, the functions, methods and processes described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or encoded as one or more instructions or code on a computer-readable medium. Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
It is understood that the specific order or hierarchy of steps or stages in the processes and methods disclosed are examples of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged while remaining within the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.
Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosure.
The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps or stages of a method, process or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
The claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. A phrase referring to “at least one of” a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover: a; b; c; a and b; a and c; b and c; and a, b and c.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. It is intended that the following claims and their equivalents define the scope of the disclosure.
This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application Ser. No. 61/307,768, entitled LOW COMPLEXITY HIGH PERFORMANCE ITERATIVE DECODING IN MULTIPLE INPUT MULTIPLE OUTPUT SYSTEMS, filed on Feb. 24, 2010, the content of which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
61307768 | Feb 2010 | US |