BRIEF DESCRIPTION OF THE FIGURES
  The above and other objects and advantages of the invention will be apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:
  
    FIG. 1 is a high level block diagram of a multiple-input multiple-output data transmission or storage system;
  
    FIG. 2 is a wireless transmission system in accordance with one embodiment of the system in FIG. 1;
  
    FIG. 3 is a block diagram of a transmitter;
  
    FIG. 4A is a signal constellation set for quadrature amplitude modulation with four signal points;
  
    FIG. 4B is a signal constellation set for quadrature amplitude modulation with 16 signal points;
  
    FIG. 5 is a vector model of the system in FIG. 1;
  
    FIG. 6A is a flow diagram of a stop-and-wait HARQ transmitter;
  
    FIG. 6B is a flow diagram of a HARQ receiver;
  
    FIG. 7 is a high level block diagram of a receiver;
  
    FIG. 8 is a detailed embodiment of FIG. 7 for a single input, single output (SISO) system;
  
    FIG. 9 is a diagram illustrating symbol-level combining in a 4-QAM system using weighted addition;
  
    FIGS. 10A-10B show subsets of signal points in a 4-QAM signal constellation set;
  
    FIGS. 11-12 show detailed embodiments of FIG. 7 for a MIMO system;
  
    FIG. 13 shows a detailed embodiment of FIG. 7 for a MIMO system that utilizes QR decomposition;
  
    FIG. 14 shows a detailed embodiment of FIG. 7 for a MIMO system that utilizes Cholesky factorization;
  
    FIG. 15 shows an illustrative flow diagram for decoding a signal vector from a decoding metric;
  
    FIG. 16 shows an illustrative flow diagram for decoding a signal vector in a 2×2 MIMO system employing the strategy of FIG. 15;
  
    FIG. 17 shows an illustrative flow diagram for decoding a signal vector in a 3×3 MIMO system employing the strategy of FIG. 15;
  
    FIG. 18A is a block diagram of an exemplary hard disk drive that can employ the disclosed technology;
  
    FIG. 18B is a block diagram of an exemplary digital versatile disc that can employ the disclosed technology;
  
    FIG. 18C is a block diagram of an exemplary high definition television that can employ the disclosed technology;
  
    FIG. 18D is a block diagram of an exemplary vehicle that can employ the disclosed technology;
  
    FIG. 18E is a block diagram of an exemplary cell phone that can employ the disclosed technology;
  
    FIG. 18F is a block diagram of an exemplary set top box that can employ the disclosed technology; and
  
    FIG. 18G is a block diagram of an exemplary media player that can employ the disclosed technology.
DETAILED DESCRIPTION
The disclosed invention provides a technique in a multiple-input multiple-output data transmission or storage system to decode a signal vector at a receiver, where the receiver may receive multiple signal vectors from the same transmitted signal vector.
  FIG. 1 shows an illustration of a basic data transmission or storage system in accordance with one embodiment of the present invention. Data, typically grouped into packets, is sent from transmitter 102 to receiver 112. During transmission, the signals may be altered by a transmission medium, represented by channel 106, and additive noise sources 108. Transmitter 102 has Nt outputs 104 and receiver 112 has Nr inputs 110, so channel 106 is modeled as a multiple-input multiple-output (MIMO) system with Nt inputs and Nr outputs. The Nt input and Nr output dimensions may be implemented using multiple time, frequency, or spatial dimensions, or any combination of such dimensions.
In one embodiment, FIG. 1 represents a wireless communication system, pictured in FIG. 2. In this embodiment, transmitter 102 is a wireless server 204, such as a commercial gateway modem, and receiver 112 is a wireless receiver 206, such as a commercial wireless computer adapter. Channel 106 is space 208 between wireless server 204 and wireless receiver 206, which obstructs and attenuates the signal due to at least multipath fades and shadowing effects. Typically, wireless communication systems use spatial dimensions to implement multiple dimensions in the form of multiple transmitting antennas 200 and receiving antennas 202.
Returning to FIG. 1, transmitter 102 prepares bit sequence 100 into signals capable of transmission through channel 106. For an uncoded system, bit sequence 100 is a binary message, where the message carries only information bits. Alternatively, for a coded system, bit sequence 100 may be an encoded version of the message. Thus, bit sequence 100 may have originated from a binary data source or from the output of a source encoder (not pictured).
One embodiment of transmitter 102 is shown in FIG. 3. Transmitter 102 converts bit sequence 100 into signals 104 appropriate for transmission through channel 106 (FIG. 1). Bit sequence 100 is passed through interleaver/encoder 300, which may interleave and/or encode bit sequence 100. If interleaver/encoder 300 performs encoding, the encoding may be based on any suitable error control code (e.g., convolutional, block, error-detecting, error-correcting, etc.). If interleaving is performed, each bit in bit sequence 100 may be assumed to be independent of all other bits in bit sequence 100. Bit sequence 306 at the output of interleaver 300 is demultiplexed by demultiplexor 308 across Nt paths 310. Each demultiplexed output 310 may or may not go through another interleaver and/or coding block 302, yielding bit sequences 312. Finally, bit sequences 312 are modulated with modulators 304, and are transmitted as signals x1, . . . , xNt, or x in vector form.
Modulators 304 group the incoming bits into symbols, which are mapped and converted to signals according to a signal constellation set and carrier signal. In one embodiment of the invention, modulator 304 uses quadrature amplitude modulation (QAM). Each symbol is mapped to a signal point in the QAM signal constellation set, where the signal points are differentiated from one another by phase and/or magnitude. For example, FIG. 4A shows a 4-QAM signal constellation set in a complex number plane. In this case, signal points 400A-400D are distinguishable only by phase. Each signal point represents a different two-bit symbol 402: 400A represents “00,” 400B represents “01,” 400C represents “11,” and 400D represents “10.” However, any other one-to-one mapping from symbol to signal point is valid.
  FIG. 4B shows a 16-QAM signal constellation set, where four-bit sequences 406 are combined into one symbol. Here, both the amplitudes and the phase of signal points 404 may vary. FIG. 4B shows a partial mapping from symbols 406 to signal points 404, where the each symbol is shown closest to its corresponding signal point. However, as before, any other mapping is possible. In general, an m-bit symbol may be mapped according to an M-QAM signal set, where M=2m. Therefore, for the transmitter configuration shown in FIG. 3, transmitter 102 is capable of transmitting mNt bits concurrently.
In accordance with one embodiment of the present invention, transmitter 102 sends the same vector, x, multiple times according to a protocol that is also known and followed by receiver 112. Depending on the protocol, there may be additional components in transmitter 102 that are not shown in FIG. 3. It should be understood that transmitter 102 may be altered in order to implement such protocols. For example, if an automatic repeat request (ARQ) protocol is used, transmitter 102 may need a buffer to store x, or equivalently bit stream 100, in the event that a retransmission is requested.
Even though x is transmitted, receiver 112 in FIG. 1 actually receives yi, where
  
  
  y
  i
  =Hx+n
  i 1≦i≦N   (1)
For clarity, FIG. 5 shows the components of each vector in equation (1). Index i represents the ith instance that the same transmitted vector, x, is transmitted. yi is an Nr×1 vector, where each vector component is the signal received by one of the Nr inputs of receiver 112. Hi 500 is an Nr×Nt channel matrix that defines how channel 106 alters the transmitted vector, x. ni is an Nr×1 vector of additive noise. Note that the characteristics of channel 106, reflected in matrix 500, and noise sources 108, and therefore received signal 110, may be different for each instance i. Differences arise because each transmission of x occurs at a different time or through a different medium.
In one embodiment, noise sources 108 may be modeled as additive white Gaussian noise (AWGN) sources. In this case, noise sources 108 are independent and identically distributed (i.i.d). That is, the noise that affects any of the Nr components in any ni does not affect the noise for any other component in ni, and the noise at any given time does not affect the noise at any other time. Also, all of the noise sources have the same probabilistic characteristics. Furthermore, each component of ni has zero mean and is random in terms of both magnitude and phase, where the magnitude and the phase are also independent. This type of noise source is called an i.i.d. zero mean circularly symmetric complex Gaussian (ZMCSCG) noise source. If the variance of each component is N0, then the conditional probability distribution function (pdf) of the received signal, Pr{y|x,H}, is given by
  
    
  
Equation (2) will be used with reference to maximum-likelihood decoding discussed in greater detail below in connection with FIG. 10.
Receiver 112 may use one or more of the N received copies of x to determine the information that was transmitted. Receiver 112 may combine multiple received vectors into a single vector for decoding, thereby utilizing more than one, and possibly all, of the transmitted signal vectors. The combining scheme disclosed in the present invention will be discussed in greater detail below in connection with FIGS. 7-11. It should be understood that the receiver in the present invention may combine all received signal vectors. Alternatively, a subset of the received signal vectors and channel matrices may be combined. For example, a received signal and the corresponding channel matrix may be discarded if the magnitude of a component in the received signal vector is below a certain threshold. Thus, the variable N should refer to the number of received signal vectors used by the receiver, which is not necessarily the same as the number of total signal vectors received.
In one embodiment of the invention, receiver 112 receives multiple instances of a common transmit vector using a retransmission protocol. For example, the transmitter and receiver may use a HARQ type-I protocol. A flow chart of the steps taken by transmitter 102 and receiver 112 are shown in FIG. 6A and FIG. 6B, respectively. FIG. 6A shows a transmitter following a stop-and-wait protocol, where the transmitter waits until a signal vector has been accepted by the receiver before sending the next signal vector. Other protocols, such as go-back-N, selective repeat, or any other suitable protocol may be used in place of stop-and-wait. Therefore, it should be understood that FIG. 6A may be modified in order to implement a different protocol.
  FIG. 6B shows a simplified flow chart of a HARQ type-I receiver protocol in accordance with one aspect of the invention. At some time, receiver 112 receives yi at step 600, corresponding to the ith transmission of x. At step 602, receiver 112 may combine all the signal vectors corresponding to transmitted signal x that have been received thus far, that is y1, . . . , yi, into a single vector, {tilde over (y)}, and decodes the combined vector or a processed version of the combined vector. In FIG. 6B, decoding refers to determining the CRC-protected message based on the combined signal vector. Other possible decoding outputs will be discussed in greater detail below in connection with FIG. 7. Errors in individual signal vectors may be corrected by combining the received signal vectors such that the combined signal vector, {tilde over (y)}, is correctable by decoding. Following decoding, error detection is performed at step 604, which in this case involves checking the CRC of the decoded vector. If errors are detected, the receiver may send a negative acknowledgement (NACK) message to the transmitter at step 606. Upon receipt of the NACK, the transmitter may send the same transmitted signal vector, which is received at step 600 as yi+1. yi+1 may be different from yi even though the same transmit signal vector x is used at the transmitter, because yi+1 is transmitted at a later time than yi and is affected by different noise and/or channel characteristics. The i+1 vectors are combined and decoded, as described previously. This procedure occurs N times, until by combining and decoding N received vectors, no CRC error is detected. At this point, the receiver sends an acknowledgment (ACK) message at step 608 back to the transmitter to inform the transmitter that the vector has been successfully received. Also, since there are no errors in the decoded data, the receiver passes the decoded data to the destination at step 610.
In another embodiment of the invention, the transmitter sends a signal vector, x, a fixed number of times, irrespective of the presence of errors. For example, the receiver may obtain N transmissions of x from repetition coding. N copies of x may be transmitted simultaneously, or within some interval of time. The receiver combines signal vectors, y1, . . . , yN, and may decode the combination or a processed version of the combination. Repetition coding may be useful when there is no feasible backchannel for the receiver to send retransmission requests.
HARQ type-I and repetition coding are two protocols that may be used in different embodiments of the present invention. Alternatively, repetition coding and HARQ can be combined such that multiple vectors are received at step 600 before combining and decoding at step 602. The invention, however, is not limited to the two protocols and their combination mentioned here. Currently, the IEEE 802.16e standard uses HARQ and repetition coding, so these particular protocols merely illustrate embodiments of the invention. Any protocol that allows the receiver to receive multiple copies of the same transmitted vector fall within the scope of the present invention.
  FIG. 7 is a block diagram of one embodiment of receiver 112 in accordance with one aspect of the present invention. Furthermore, it illustrates one way to implement combining and decoding at step 602 in FIG. 6B. Combiner 702, which may or may not use channel information 718 provided from channel combiner 700, combines the symbols of the N received vectors using any suitable combining technique. This type of combining is hereinafter referred to as symbol-level combining, because the combiner operates on the symbols of the signal vector. Combined received vector 706, {tilde over (y)}, can be passed to signal processor 712. Signal processor 712 may process the combined received vector to produce a new signal vector with white noise components. If the noise is already white, signal processor 712 may be bypassed or omitted from the receiver, or may perform other processing functions on the combined received signal vector. Signal processor 712 may also use channel information 716 provided by channel combiner/preprocessor 700. After the noise of the combined received vector is whitened, the processed signal vector, y′, is decoded by decoder 704. Decoder 704 may use channel information 708 provided by combiner 700 to operate on processed signal vector 710, y′. Decoder 704 may return an estimate of the signal vector, x. Decoder 704 may return soft information or hard information. If decoder 704 returns hard information, it may have been the result of hard-decoding or soft-decoding. For a coded system, decoder 704 may return coded information or decoded information.
Single-input single-output (SISO) systems are a special case of MIMO systems in which Nt=Nr=1. System 800, in FIG. 8, shows a detailed embodiment of FIG. 7 for a SISO system. First, the signals are combined by weighted addition. Weights 820 may be chosen to maximize the signal-to-noise (SNR) ratio, a technique called maximal ratio combining (MRC). For MRC or other weighted addition combining, weights 820 may be functions of channel information 808 determined by combiner 800. Following combining by symbol combiner 802, combined received signal 806 may be decoded using maximum-likelihood (ML) decoder 804.
  FIG. 9 shows an example of a weighted addition combining, HARQ receiver of the configuration shown in FIG. 8. The signal constellation set is 4-QAM, which was described above in connection with FIG. 4A. Signal points 900A-900D represent the magnitude and phase of a transmitted symbol. For illustration purposes, assume that the transmitter is sending the symbol, “00” (902A), to the receiver using a HARQ type-I protocol. Assume, again for the purpose of illustration, that the channel does not attenuate, amplify, or alter the signal in any way. Therefore, ideally, a symbol with the magnitude and phase of signal point 900A would be received. However, if due to additive noise, a signal with a magnitude and phase of signal point 904 is actually received, it will be incorrectly decoded as “01,” because it is closer to signal point 900B than 900A. Note that an ML decoder may make this decision if the noise is assumed to be AWGN. The error-detecting code may then detect the presence of the bit error, resulting in a request for a retransmission. On the second transmission, a signal corresponding to signal point 906 is received. If signal point 906 is decoded on its own, it may be incorrectly decoded as “10.” However, by weighted addition of signal points 904 and 906, the resulting combined symbol may fall approximately on dotted line 908. The combined symbol is now closest to signal point 900A and will be decoded correctly as “00.” Thus, the receiver configuration shown in FIG. 8 may be used to effectively decode multiple received signal vectors.
Referring back to FIG. 8, a mathematical treatment of the combining scheme for a SISO system is considered. To maximize SNR, weights 802 may take on the value,
  
    
  
for each received symbol, yi. These weights may be computed by combiner/preprocessor 800. Therefore, the combined received symbol may be equal to:
  
    
  
where {tilde over (h)}=√{square root over (Σi=1N|hi|2)} and
  
    
  
Note that noise component ñ in the combined received symbol is Gaussian, because a weighted sum of Gaussian variables is still Gaussian. Furthermore, the weights for MRC are chosen such that the noise has unit variance. Therefore, a noise whitening filter, such as signal processor 712 in FIG. 7, is not needed. As shown in equation (5), the combined symbol, {tilde over (y)}, may be treated as an individually received signal vector affected by channel {tilde over (h)} and Gaussian noise ñ.
Therefore, following combining, ML decoder 804 may decode the combined symbol as if it was a single received symbol. ML decoder 804 may calculate a log-likelihood ratio (LLR) for each bit of the common transmit sequence. An LLR is a soft-bit metric often associated with maximum-likelihood decoding. For a received symbol y containing a bit corresponding to transmitted bit bk, where y is received from a channel with response h, the LLR for bit bk may be defined as
  
    
  
Because {tilde over (y)} may be treated as a single received symbol, the LLR calculation may be expressed as
  
    
  
The sign of the LLR indicates the most likely value of the transmitted bit (1 if positive, 0 if negative), and the magnitude of the LLR indicates the strength or confidence of the decision. Thus, ML decoder 804 may output soft information in the form of an LLR for each bit. Alternatively, ML decoder 802 may map the LLR to a hard decision, and output a binary estimate of the transmitted sequence, or may provide the LLR to a soft decoder. To calculate the LLR for a bit, bk, of the common transmitted symbol ML decoder may implement:
  
    
  
which will be derived below in equations (7) through (12). The variable Xλ(j) in equation (6) denotes a subset of the signal constellation set whose λth bit equals j for j=0,1. For example, FIGS. 10A and 10B illustrate the four possible subsets for a 4-QAM signal constellation set. 4-QAM is discussed in greater detail above in connection with FIG. 4A. In each figure, the λth bit is underlined for emphasis. Note that, as is consistent with the definition of the subset, the emphasized bit is the same for all members of a subset. Thus, the signal point in quadrant A belongs in subsets X0(0) and X1(0). Similarly, the signal point in quadrant B belongs in subsets X0(1) and X1(0), etc.
Equation (6), symbol-level combining LLR equation, may be calculated as follows:
  
    
  
Equations (7) and (8) follow from the definition of the LLR as previously described. Equation (9) is reached by applying Bayes' Theorem, a technique known in the art, to equation (8). Then, equation (10) shows equation (9) written in terms of transmitted symbols, {circumflex over (x)}, instead of transmitted bits, bk. For example, in the numerator of equation (9), the probability that b0=1 is the sum of the probabilities that the transmitted symbol was “01” or “11” for a 4-QAM system. As shown in FIG. 10A, “01” and “11” is subset X0(1). Therefore, Pr{{tilde over (y)}|b0=1,{tilde over (h)}} is equivalent to Σx(1)εX0(1)Pr{{tilde over (y)}|{circumflex over (x)}(1),{tilde over (h)}}. Finally, equation (11) utilizes the approximation, Σi log ai≈ log maxiai, and equation (12) results from plugging in equation (2) for the condition probabilities. Recall that equation (2) is the conditional probability distribution function (PDF) for an AWGN channel.
The receiver for a SISO system shown in FIG. 8 with MRC is referred to as an optimal receiver scheme for decoding a signal vector. An optimal receiver scheme is hereinafter defined to be one that, given the N received signal vectors, chooses the signal vector that has the highest probability of being the actual transmit signal vector in the presence of AWGN. This is considered optimum, because all information from the N received signals is used fully. Mathematically, an optimum decoding scheme chooses the signal vector, {circumflex over (x)}, that maximizes
  
  Pr{{circumflex over (x)}|y1, . . . , yN,h1, . . . , hN}.   (13)
A decoder that maximizes equation (13) is a maximum-likelihood decoder. Thus, such a decoder may compute an associated LLR for each bit, which is referred to herein as an optimum LLR, or LLRopt.
LLRopt may be derived as follows:
  
    
  
Equation (14) and (15) follow from the definition of the log-likelihood ratio. Most of the remaining equations are derived through substantially the same process as equations (7) through (12). Equation (18) follows from the statistical independence between each received signal vector. Thus, for independent received symbols y1 and y2, Pr(y1,y2)=Pr(y1)Pr(y2), as shown in equation (18).
Although the LLR determined by the symbol-level combining receiver (equation (12)) does not appear to be equal to the optimal LLR (equation (20)), the difference arises due to the Σi log ai≈ log maxiai approximation. Before the Σi log ai≈ log maxiai approximation, it may be shown that the MRC-based symbol-level-combining scheme of FIG. 8 produces an optimal receiver. Recall that Equation (10) is the equation for calculating an LLR in the MRC-based symbol-level combining scheme of FIG. 8 prior to applying the approximation. Equation (10) is reproduced below as equation (21). Thus, the following sequence of equations shows that the LLR produced by symbol-level combining is equivalent to the optimal LLR.
  
    
  
Equation (22) follows from equation (21) by plugging in the PDF for an AWGN channel shown in equation (2). The remaining equations follow from mathematical manipulation. Equation (28) is the same as equation (18), which was shown above to be equal to the optimal LLR. Therefore, the decoding scheme used by the receiver in FIG. 8 is an optimal decoding scheme for signals received from an AWGN channel. Even if the receiver implements equation (6), which utilizes the Σi log ai≈ log maxiai approximation, the decoding results of the receiver may still be near-optimal.
  FIG. 11 shows an illustrative block diagram for a symbol-level combining receiver in a MIMO system. Also, FIG. 11 is a detailed view of one embodiment of the receiver configuration shown in FIG. 7. Combiner 1102 may combine the N received signal vectors by weighted addition. In one embodiment of the present invention, the resulting combined received signal vector may be:
  
    
  
where {tilde over (H)}=Σi=1NH*iHi and ñ=Σi=1NH*ini. {tilde over (H)} is an Nt×Nt matrix referred to hereinafter as the combined channel matrix, and may be calculated by combiner/preprocessor 1100. ñ is an Nt×1 noise vector hereinafter referred to as the combined noise vector. Here, the weights in equations (30) and (31) are chosen to maximize the SNR. Although the term, maximal ratio combining (MRC), is typically used for SISO systems, it will also be used herein to refer to a symbol-level, MIMO combining scheme that maximizes the SNR. Therefore, the embodiment described here can be referred to as an MRC MIMO receiver. Following the combination, equation (32) shows that the combined received signal vector may be modeled as a single received vector, {tilde over (y)}, affected by channel {tilde over (H)} and noise components ñ. Thus, the combined received signal vector may be decoded in a similar manner as any other received signal vector.
However, the covariance of the combined noise vector, ñ, may easily be shown to equal {tilde over (H)}. Therefore, the noise is not white, because it is well known that white noise has a diagonal covariance matrix. Thus, to whiten the noise components, the combined received signal is processed by signal processor 1112. Signal processor 1112 may whiten the noise by multiplying the signal by {tilde over (H)}−1/2, where a matrix A1/2 is defined to be any matrix where A1/2A1/2=A. The value of {tilde over (H)}−1/2 may be obtained from combiner/preprocessor 1100. Following the multiplication, the processed signal, y′, may be equal to:
  
    
  
where the covariance of the processed noise vector, n′, is E[n′Nn′*N]=INt, as desired. Therefore, the processed combined signal vector, y′, may be modeled as a single received signal vector affected by an AWGN channel, where the channel response matrix is {tilde over (H)}1/2 and the noise vector is n′.
The filtered signal, y′, may then be decoded by ML decoder 1104. The ML decoder may calculate the log-likelihood ratio by implementing the equation,
  
    
  
Equation (35) may be derived as follows:
  
    
  
Equations (36) and (37) follow from the definition of the LLR. The remaining equations may be derived through substantially the same process as the process used to obtain equations (7) through (12). ML decoder 1104 may output the LLRs directly as soft information or may convert the LLRs to another soft-bit metric. Alternatively, ML decoder 1104 may map the LLRs to hard decisions, and output a binary sequence estimate of the transmitted sequence, or may output the LLRs to a soft decoder.
It may be shown that the MRC-based symbol-level combining scheme shown in FIG. 11 is an optimal decoding scheme. An optimal LLR for a MIMO system may be calculated as follows:
  
    
  
Equation (41) and (42) follow from the definition of the log-likelihood ratio. The remaining equations are derived through substantially the same process as equations (7) through (12) or equations (14) through (20).
Although the LLR determined by the symbol-level combining receiver (equation (40)) does not appear to be equal to the optimal LLR (equation (46)), the difference arises due to the Σi log ai≈ log maxi ai approximation. Before the approximation, it may be shown that the MRC-based symbol-level-combining scheme produces an optimal receiver. The following sequence of equations shows that the LLR produced by symbol-level combining is equivalent to the optimal LLR.
  
    
  
Equation (47) follows from equation (38), the LLR equation for an MRC-based symbol-level combining receiver, by plugging in the PDF for an AWGN channel shown in equation (2). The remaining equations follow from mathematical manipulation. Equation (51) is equivalent to equation (43) for an AWGN channel, which was shown above to be equal to the optimal LLR. Therefore, the decoding scheme used by the receiver in FIG. 11 may be used to implement an optimal decoding scheme for signal vectors received from an AWGN channel. Even if the receiver implements equation (40), which utilizes the Σi log ai≈ log maxiai approximation, the decoding results of the receiver may still be near-optimal.
Note that the expression, ∥y′−{tilde over (H)}1/2x∥2, is essentially a distance calculation, and is a significant portion of the equation for calculating an LLR, shown above as equation (35), for a MIMO system. Therefore, the ∥y′−{tilde over (H)}1/2x∥2 distance equation, or any other such equation in an LLR equation, is hereinafter referred to as a decoding metric. The decoding metric for ML decoder 1104 may be calculated as follows:
  
    
  
Notice that the last term in equation (55) does not depend on the transmitted signal vector. Therefore, the last term is common to both the numerator and denominator in deriving the LLR (derived above in equations (36) through (39)), and may be ignored in the LLR calculation, or equivalently, the circuit implementation of the calculation.
The receivers illustrated in FIGS. 7, 8, and 11 show all N received vectors and N channel response matrices as inputs into their respective combining blocks. However, all N signal vectors and N channel matrices are not necessarily given to the combiners at the same time, and the receiver is not required to wait until after all N signal vectors are received to begin operating. Instead, the receivers shown in FIGS. 7, 8, and 11 merely illustrate that the system is capable of combining information from all N transmissions of a common transmit signal vector in any suitable manner. In fact, in some embodiments, such as when a HARQ protocol is used, the combiners may only need to accept one signal vector or channel matrix at any given time, and information on the previous transmissions may be obtained from some other source.
  FIG. 12 shows a more detailed receiver of FIG. 11 that illustrates how a receiver may operate when N signal vectors are received in groups of P signal vectors, where P≦N. The variable P is hereinafter defined to be the number of signal vectors that are received substantially at the same time (e.g., concurrently, within a given amount of time, etc.). Thus, for a HARQ or ARQ protocol, P may be equal to one. For repetition coding or another suitable fixed transmission scheme, P may be equal to N. For other suitable protocols, 1<P<N. For simplicity, it is assumed that P is divisible by N. In this scenario, there are a total of P/N transmissions of P signal vectors. The present invention, however, is not limited to this constrained situation. Also, for clarity, subscripts on any combined vectors or matrices will refer to the number of vectors or matrices included in the combination. For example, {tilde over (y)}i may refer to a combined received signal vector for a combination of received vectors y1, . . . ,yi or yi+1, . . . ,y2i, etc.
When a first set of P signal vectors is received by the system in FIG. 12, no previous information about the common transmit signal vector is available. Therefore, combiners 1200 and 1202 may calculate the combined received vector, {tilde over (y)}P, and the combined channel matrix, {tilde over (H)}P, for the P signal vectors, respectively. The values of {tilde over (y)}P and {tilde over (H)}P may be stored in storage 1222 and 1220, respectively, for future use. Although storage 1220 and 1222 are shown to be separate in FIG. 12, they may also be a single storage system. Combiner/preprocessor 1200 may additionally calculate {tilde over (H)}P−1/2 using {tilde over (H)}P. Therefore, ML decoder 1204 may optimally decode for the common transmit signal based on the information available in the P received signal vectors.
When a second set of P signal vectors is received, combiners 1200 and 1202 may combine the newly received signal vectors with the information for the first set of signal vectors stored in storage 1220 and 1222. That is, combiner 1202 may calculate {tilde over (y)}P for the second set of P signal vectors, and may add them to the combined vector that has already been calculated. Similarly, combiner 1200 may calculate {tilde over (H)}P for the second set of P channel matrices, if they are different than the first set, and may add them to the combined channel matrix that has already been calculated. If the channel matrices are the same as for the first transmission, combiner 1200 may simply utilize the information obtained from the previous calculations. Thus, combiners 1200 and 1204 may obtain combined signal vectors and combined channel matrices for the first 2P signal vectors ({tilde over (y)}2P and {tilde over (H)}2P) without re-computing information obtained from previous transmissions.
Mathematically, combiners 1200 and 1202 may compute:
  
  
  {tilde over (y)}
  2P
  =Σ
  i=1
  2P
  H*
  i
  y
  i
  ={tilde over (y)}
  P
  +Σ
  j=P+1
  2P
  H*
  j
  y
  j   (56)
  
  
  {tilde over (H)}
  2P
  =Σ
  i=1
  2P
  H*
  i
  H
  i
  ={tilde over (H)}
  P
  +Σ
  j=P+1
  2P
  H*
  j
  H
  j.   (57)
{tilde over (y)}2P and {tilde over (H)}2P may be stored in storage 1222 and 1220, respectively, by overwriting {tilde over (y)}P and {tilde over (H)}P that was stored after the first transmission. {tilde over (y)}2P and {tilde over (H)}2P may then be utilized when a third set of P signal vectors are received.
Using the storage systems shown in FIG. 12, a receiver may incrementally change its combined received vector and combined channel matrix as new sets of signal vectors are received. After each set of P signal vectors is received, ML decoder 1304 produces an optimal estimate of the common transmit signal vector for the given number signal vectors that have been received. Thus, the effectiveness of the receiver does not depend on the number of received vectors. This is particularly useful for certain transmission protocols, such as HARQ, where the number of received signal vectors may vary.
Another benefit illustrated by the receiver configuration in FIG. 12, and may be true of any of the other embodiments of the present invention (e.g., FIGS. 7, 11, 13, and 14), is decoder reusability for arbitrary N. That is, only one decoder is implemented no matter how many signal vectors are received. Using a separate decoder for each possible value of N would drastically increase both the amount and complexity of the hardware. In addition, since it would be impractical and impossible to implement a different decoder for all N=1, the decoding flexibility of the receiver would be limited. Therefore, it may be highly beneficial, in terms of decoder complexity and flexibility, that the receiver configurations shown in FIGS. 7, 11, 12, 13, and 14 may implement a single decoder for arbitrary N.
Another benefit of the receiver configuration in FIG. 12 is memory efficiency. After each set of P signal vectors is received, a new combined signal vector, {tilde over (y)}, is calculated. This signal vector may replace the previous information stored in memory. Therefore, the memory requirement of storage 1220 and 1222 does not depend on the number of received vectors. In particular, storage 1200 may be just large enough to store one copy of {tilde over (H)}, and storage 1202 may be just large enough to store one copy of {tilde over (y)}. This is in contrast to a system that re-computes {tilde over (y)} and {tilde over (H)} each time a new set of vectors is received. In this scenario, the receiver would need to save the signal vectors and channel response matrices for all previous transmissions.
Referring now to FIGS. 13 and 14, other detailed embodiments of FIG. 7 for a symbol-level combining receiver are shown. These embodiments utilize additional signal processing techniques that may be used to reduce the calculation complexity of the ML decoder. Storage systems, such as storage 1220 and 1222, are not expressly shown in FIGS. 13 and 14, but may be assumed to be part of their corresponding combiners.
  FIG. 13 shows a symbol-level combining receiver that utilizes QR decomposition to reduce the complexity of calculating the ML decoding metric. In addition to combining channel response matrices and determining {tilde over (H)}1/2, combiner/preprocessor 1300 may also factor {tilde over (H)}1/2 into a matrix with orthonormal columns, Q, and a square, upper-triangular matrix R . Therefore, {tilde over (H)}1/2=QR and {tilde over (H)}−1/2=R−1Q*. Accordingly, the processed combined received signal vector,
  
  
  y′
  N
  ={tilde over (H)}
  N
  −1/2
  {tilde over (y)}
  N,   (32)
computed by signal processor 1312 may be expressed as,
  
    
  
where the covariance of the noise is E[n′Nn′*N]=INt. Signal processor 1312 may additionally process y′N by multiplying it by Q*. This operation yields,
  
    
  
Therefore, because Q* is orthonormal and deterministic, the covariance of Q*n′N is still the identity matrix. Thus, Q*y′N may be treated as a single received signal vector affected by channel R and white noise Q*n′N.
After signal processor 1312 in FIG. 13 processes y′, decoder 1304 may decode the result using channel information 1308 provided by channel preprocessor 1300. The decoding metric for the processed signal may be ∥Q*y′N−Rx∥2, or ∥Q*R−1Q*{tilde over (y)}N−Rx∥2. Because R is an upper-triangular matrix, the complexity of the decoding metric may be reduced compared to the complexity of the decoding metric implemented by ML decoder 1204 in FIG. 12.
Referring now to FIG. 14, the illustrated receiver utilizes Cholesky factorization to reduce the complexity of calculating the ML decoding metric. After combiner 1400 generates a combined channel matrix, {tilde over (H)}, the combiner may factor the combined matrix using a Cholesky factorization. The Cholesky factorization factors a square matrix into a lower triangular matrix, L, and its conjugate transpose, L*. Thus, the combined channel matrix may be written as:
  
  
  {tilde over (H)}
  N
  =LL*   (63)
Therefore, combined received signal vector,
  
  
  {tilde over (y)}
  N
  ={tilde over (H)}
  N
  x+ñ
  N,   (64)
from combiner 1402 may be expressed as,
  
  
  {tilde over (y)}
  N
  =LL*x+ñ
  N.   (65)
However, the covariance of the combined noise vector, ñ, is equal to {tilde over (H)}. Therefore, the noise is not white, and thus not as easily decodable. To whiten the noise, the combined received vector, {tilde over (y)}, may be passed through signal processor 1412. Signal processor 1412 may multiply {tilde over (y)} by the inverse of L, or L−1, obtained from preprocessor 1400. This produces a processed signal vector,
  
    
  
where ñ′N=L−1ñN. The new noise component, n′N, is white, because E[ñ′Nñ′*N]=INt. Therefore, y′N may be treated as a single received signal affected by channel L* and white noise n′N, and decoded as such.
Therefore, after signal processor 1412 in FIG. 14 produces y′, decoder 1404 may decode y′ using channel information 1408 provided by channel preprocessor 1400. The decoding metric for the processed signal may be ∥L−1{tilde over (y)}N−L*x∥2. Because L* is an upper-triangular matrix, the complexity of the decoding metric may be reduced compared to the complexity of the decoding metric implemented by ML decoder 1204 in FIG. 12.
More detailed embodiments of preprocessor 1400, signal processor 1412, and decoder 1404 (FIG. 14) will be described below in connection with FIGS. 15-17, and equations (68) through (120). In particular, FIGS. 15 and 16 and equations (76) through (98) describe how various components in FIG. 14 may be implemented for a 2-input, 2-output MIMO system. FIGS. 15 and 17 and equations (99) through (120) describe how various components in FIG. 14 may be implemented for a 3-input, 3-output MIMO system. Although only 2-input, 2-output and 3-input, 3-output examples are given, it should be understood that the receiver of FIG. 14 may be practiced according to the description below for any R-input, R-output MIMO system.
Preprocessor 1400 may compute the Cholesky factorization of the combined channel matrix, {tilde over (H)}=LL*, using the Cholesky algorithm. The Cholesky algorithm is an R-step recursive algorithm, where R is the number of inputs or outputs in the MIMO system. Thus, the number of calculations performed by the preprocessor increases as the size of the channel matrix grows. At each step, the Cholesky algorithm calculates a matrix A(i), where
  
  
  A
  (i)
  =L
  i
  A
  (i+1)
  L*
  i
  , i=1, . . . ,R   (68)
The recursive algorithm starts with A(1), which is the original matrix, {tilde over (H)}, and ends with A(R)=LRA(R+1)L*R, where A(R+1) is the identity matrix, IR×R. Therefore, by plugging in all R equations for A(i), the algorithm yields,
  
    
  
The result, as expected, is a decomposition of {tilde over (H)} that produces a lower triangular matrix, L=L1L2 . . . LR, and its conjugate transpose, L*=L*R . . . L*2L*1. At each stage i, the matrix A(i) may be written as,
  
    
  
a(i) is a single entry in A(i), b(i) is an (R−i)×1 vector, b(i)* is the conjugate transpose of b(i), and B(i) is an (R−i)×(R−i) matrix. Using equation (68) and the variables defined in equation (73), the matrices A(i+1), for the next step of the algorithm, and Li may be written as,
  
    
  
Therefore, preprocessor 1400 may successively calculate matrices L1, . . . ,LR, and compute L=L1 . . . LR and its inverse, L−1=LR−1LR−1−1 . . . L1−1.
For a 2×2 combined channel matrix, {tilde over (H)}=Σi=1NH*iHi, the matrix components may be represented by h11, h12, h*12, and h22. Thus, the first matrix, A(1), may be given by,
  
    
  
Note that h21, the first component on the second line is equal to h*12, because {tilde over (H)}=Σi=1NH*iHi=(Σi=1NH*iHi)*=Σi=1NHiH*i={tilde over (H)}*. Using the variables of equation (73), A(1) may also be expressed as,
  
    
  
The first step in the recursive algorithm involves determining A(2) and L1 using equations (74) and (75), respectively. Accordingly, A(2) and L1 may be given by,
  
    
  
where h11(2)=h11h22−h*12h12.
After determining A(2), the second and final step in the Cholesky algorithm involves calculating L2 and A(3). In accordance with equation (73), A(2) may be written as,
  
    
  
Thus, L2 and A(3) may be expressed as,
  
    
  
As expected at the final step of the Cholesky algorithm, the matrix, A(3)=A(R+1), is the identity matrix. Note that there are only two steps in the Cholesky algorithm, because {tilde over (H)} is 2×2.
The lower triangular matrix, L, where {tilde over (H)}=LL*, may be determined following the recursive algorithm described above. In general, L is determined by multiplying L1, . . . ,LP. Thus, for the 2×2 case, L may be calculated by multiplying L1 and L2, producing,
  
    
  
The inverse of L, or L−1, may also be calculated by computing the inverse of both L1 and L2, and multiplying them in reverse order. That is,
  
    
  
Therefore, using the Cholesky algorithm, a preprocessor (e.g., preprocessor 1400) in a MIMO receiver may compute L and L−1 for a combined channel matrix, {tilde over (H)}. These matrices may be used by a signal processor, such as signal processor 1412, or by a decoder, such as ML decoder 1404. Alternatively, a preprocessor may have the equations for one or more factorizations, or equivalent representations of the equations, hard-coded or hard-wired. For example, the preprocessor may hard-code or hard-wire equations (83) and (84).
L and L−1, as calculated in the Cholesky algorithm described above, may be used by an ML decoder to compute a log-likelihood ratio for each bit in a transmitted sequence. For example, the receiver in FIG. 14 may be configured such that the ML decoder computes LLRs according to,
  
    
  
Thus, the decoding metric for this receiver may be ∥L−1Y−L*X∥2. Plugging in L2×2 and L2×2−1 from the Cholesky factorization described above, the metric implemented by the decoder would be,
  
    
  
Because L−1Y may be an input into the decoder (e.g., from signal processor 1412 in FIG. 14), the decoder may actually compute,
  
    
  
where y′=L−1Y is the input. To compute the LLR in equation (85), the decoding metric in equation (87) may be repeatedly computed using all the possible combinations of X. In this way, the decoder may determine the combination that produces the minimum values of equation (87) for b=1 and b=0. For a 2×2 64-QAM system, there are 64 possible values for each symbol in X. Therefore, the distance calculation, ∥L−1Y−L*X∥2, would be computed 64×64=4096 times.
Note that the decoding metric shown in equation (87) computes square roots, divisions, and multiplications. These may be computationally expensive operations, and may therefore be time intensive and/or complex in implementation. Furthermore, the metric may be computed repeatedly (e.g., 4096 times). Therefore, the effect of the complex/time-intensive computations may be magnified. The part of the calculation that is repeatedly computed is hereinafter referred to as the critical path. Accordingly, a different decoding strategy is provided in the present invention that reduces the complexity of the critical path. In particular, part of intensive calculations may be incorporated into a preprocessor (e.g., preprocessor 1400) or into the computation after the minimizing values of X are determined.
To reduce the complexity of the critical path, the decoding metric shown in equation (85) may be factored as follows:
  
    
  
For simplicity, the factored decoding metric may be written as,
  
    
  
where
  
    
  
and {tilde over (D)} is a simplified decoding metric. Therefore, the LLR may be expressed as,
  
    
  
Note that the simplified decoding metric may be repeatedly computed rather than the original decoding metric. Thus, the
  
    
  
calculation has been removed from the critical path, which has both a multiplication and division operation. Therefore, the complexity of the calculation in the critical path is reduced, but at the expense of increasing the final LLR calculation. However, fewer LLRs (e.g., 16 LLRs for a 2×2 64-QAM system) are typically calculated than distance calculations (e.g., 4096). Therefore, removing
  
    
  
from the critical path may still provide substantial time and/or complexity savings.
Furthermore, the
  
    
  
term used in the final LLR computation may not be needed until after the critical path calculations are completed. Therefore,
  
    
  
may be computed during the time that the time-intensive critical path calculations are being performed. Therefore, slow, but less complex multiplication and division implementations may be used without increasing the amount of time needed to compute the LLR. For example, the division operation may be implemented using a serial inversion mechanism.
In some embodiments, rather than computing the squared, simplified decoding metric, a linear approximation may be used. For example, the simplified decoding metric may be,
  
  {tilde over (D)}linear=∥{circumflex over (L)}−1Y−√{square root over (h11(2))}{tilde over (L)}*X∥,   (93)
which leaves out the squared term in the squared, simplified decoding metric. This approximation may reduce the complexity of the calculation within the critical path, and therefore may result in significant time and/or complexity savings in comparison to the squared version of the distance calculation.
If the linear distance metric in equation (93) is used as the decoding metric, the final LLR calculation may be updated to,
  
    
  
Note that the complexity of the critical path has been reduced again at the expense of the complexity of the final LLR calculation. However, because the
  
    
  
term may be computed while the critical path calculations are computed,
  
    
  
may be implemented using techniques that may be low-complexity and time-intensive. Furthermore, if
  
    
  
is implemented in hardware, √{square root over (h11(2))} and √{square root over (h11)} may be computed using the same square root circuitry, thereby reducing the total amount of hardware.
Another benefit of implementing the linear decoding metric of equation (93) and the LLR of equation (94) is the fact that the computation is symbol-based rather than vector-based. That is, minimizing {tilde over (D)} may involve determining values for all the symbols in X. However, minimizing {tilde over (D)}linear involves determining the minimum value for a single symbol in X. Therefore, a decoder using the linear metric may output results symbol-by-symbol, rather than in groups of symbols. This may be beneficial when hard-decoding is used. Using hard-decoding, LLRlinear may also be computed symbol-by-symbol, and may then be directly mapped to a hard decision. Thus, a
  
    
  
correction term may not be needed. Without having to compute a division operation and an extra square root operation, the complexity of the system may be further reduced considerably.
Referring now to FIG. 15, illustrative flow diagram 1500 is shown for decoding a signal vector based on a decoding metric. The signal vector decoded according to the steps of flow diagram 1500 can be a combined signal vector, such as the combined signal vector produced by combiner 1402 of FIG. 14. At step 1502, channel information can be preprocessed (e.g., by preprocessor 1400 of FIG. 14) for use in evaluating a simplified decoding metric. The simplified decoding metric may be derived from factoring a decoding metric. For example, the decoding metric may be ∥L−1Y−L*X∥2, where L* and L−1 are shown above in equation (86). In this case, the simplified decoding metric may be {tilde over (D)}=∥{circumflex over (L)}−1Y−√{square root over (h11(2))}{tilde over (L)}*X∥2. The term factored out of the decoding metric may be
  
    
  
which may be referred to as a modifier value or simply a modifier. Alternatively, the simplified decoding metric may be {tilde over (D)}linear=∥{circumflex over (L)}−1Y−√{square root over (h11(2))}{tilde over (L)}*X∥, and the resulting modifier may be,
  
    
  
Thus, the simplified decoding metric may be a function of signal vectors and channel characteristics, while the modifier may be a function of only the channel characteristics.
The channel preprocessing performed at step 1502 can reduce the amount or complexity of computation performed in the critical path. That is, channel preprocessing can compute, in advance of the operations in the critical path, any functions of channel information that would otherwise have been computed, possibly repeatedly, in the critical path. The preprocessors can compute any channel functions that are common to each evaluation of the simplified decoding metric for the different values of X. For example, if the simplified decoding metric is {tilde over (D)} or {tilde over (D)}linear,
  
    
  
may be common to each evaluation of the simplified decoding metric. Therefore a channel preprocessor may compute √{square root over (h11(2))} at step 1502 for use in evaluating the simplified decoding metric, which can also be used to compute the modifier.
With continuing reference to FIG. 15, at step 1504, a soft-bit information metric, such a log-likelihood ratio, can be computed based on the simplified decoding metric. Continuing the examples described above in step 1502, a soft-bit information metric can be computed in the form of an LLR using the simplified decoding metric, {tilde over (D)}, where
  
    
  
Alternatively, a soft-bit metric can be computed using the linear simplified decoding metric, {tilde over (D)}linear, according to,
  
    
  
The modifier can be computed at step 1506 substantially currently (e.g., in parallel) to step 1504. That is, while the simplified decoding metric is repeatedly computed for different possible values of X, the modifier can be computed. For the example described above, step 1506 may involve computing
  
    
  
In this embodiment, the hardware (in a hardware implementation) can include a multiplier and a divider. Alternatively, step 1506 may involve computing
  
    
  
in which case the hardware may additionally include a square root circuit. In some embodiments, some of the resources used to perform step 1504 may also be used to perform operations in step 1506. As described above, because step 1504 may take a relatively long time to complete, any multiplier, divider, or square root circuit for computing step 1506 can be embodied in a slower and lower-complexity implementation.
At step 1508, the soft-bit information metric and the modifier can be combined to produce soft information corresponding to the transmitted digital sequence. The soft-bit information metric and the modifier may be combined by multiplying the two values. In these embodiments, R multipliers may be implemented to multiply the R soft-bit information metric by the modifier to create R final LLR values. This combining step may be computed following the critical path, and in a postprocessor.
Flow diagram 1500 can be used to decode a combined signal vector that advantageously pulls out as much computation as possible from the critical path. The computations are instead performed by preprocessors at step 1502 or by postprocessors, at step 1508. Thus, computations that are repeatedly performed may have low-complexity and/or may be efficient.
Referring now to FIG. 16, flow diagram 1600 shows a more detailed, yet still simplified, illustration of decoding a combined signal vector in a 2×2 MIMO system in accordance with the decoding strategy of flow diagram 1500 (FIG. 15). At step 1602, calculations involved for determining {circumflex over (L)}−1 and {tilde over (L)}* are computed. For a 2×2 system, where
  
    
  
step 1602 may first involve determining h11(2)=h11h22−h12h*12, and may then involve determining its square root, √{square root over (h11(2))}. These values may be determined by a channel preprocessor (e.g., preprocessor 1400 (FIG. 14)). At step 1604, a combined received signal vector, {tilde over (y)}, may be processed by multiplying the vector by {circumflex over (L)}−1. The combined received signal vector may be obtained using MRC or any other suitable combining method, such as another form of weighted addition. The combined received signal vector may be obtained from a signal vector combiner, such as MRC combiner 1402 in FIG. 14. The multiplication by {circumflex over (L)}−1 may be performed by a signal processor, such as signal processor 1412 in FIG. 14.
At step 1606, a simplified decoding metric, may be calculated for each possible combination of X. For a 2×2 system, the simplified decoding metric may be {tilde over (D)}=∥{circumflex over (L)}−1Y−√{square root over (h11(2))}{tilde over (L)}*X∥2. Thus, at step 1606, √{square root over (h11(2))}{tilde over (L)}* may be multiplied by each valid common transmit signal vector, X, and the result from each multiplication may be used to determine the simplified decoding metric. Alternatively, the decoding metric may be a linear approximation of the simplified decoding metric, {tilde over (D)}linear=∥{circumflex over (L)}−1Y−√{square root over (h11(2))}{tilde over (L)}*X∥. Step 1606 may therefore involve computing a suitable decoding metric many times (e.g., 4096 times for a 2×2, 64-QAM system, or 64 times for each symbol). Step 1606 may be performed by a maximum-likelihood decoder, such as by ML decoder 1404 in FIG. 14.
After calculating the decoding metric for each possible X, the minimizing values for b=1 and b=0 are used to determine a simplified LLR at step 1608. As described above, the simplified LLR may be determined by computing,
  
    
  
or LLR′linear. The simplified LLR may be computed by a maximum-likelihood decoder, such as by ML decoder 1404 in FIG. 14. At step 1612, the simplified LLR may be modified by a factor to compute the true LLR. In the 2×2 case, the factor may be
  
    
  
depending on which decoding metric is used. This factor may be determined by step 1610.
Step 1610 may be executed while steps 1604, 1606, and 1608 are being executed. Namely, step 1610 may be computed at any time while steps 1604, 1606 and 1608 are computed. Alternatively, step 1610 may be computed some time before or some time after the other steps. Step 1610 involves performing calculations that are not used by steps 1604, 1606, and 1608, but are used to compute the final LLR value. Thus, step 1610 may perform any suitable calculations that are used in calculations after the critical path (e.g., step 1612). For a 2×2 system, step 1610 may involve computing h11h11(2), and using the result to compute
  
    
  
Alternatively, step 1610 may involve computing √{square root over (h11)}, then using the result to compute √{square root over (h11)}√{square root over (h11(2))}, and finally computing
  
    
  
Recall that √{square root over (h11(2))} has already been computed at step 1602. Thus, √{square root over (h11)} may be computed using the same hardware, if applicable, as the hardware used to compute √{square root over (h11(2))}.
  
    
  
may be used by step 1610 to compute the final LLR, as described above. Step 1610 may be computed by a channel processor, such as preprocessor 1400 in FIG. 14.
The Cholesky factorization and decoding examples given above in connection with equations (76) through (95) FIGS. 15 and 16 have been for 2-input, 2-input MIMO systems. It should be understood, however, that the Cholesky factorization may be applied to any R-input, R-output MIMO system, and flow diagrams 1500 and 1600 may be utilized for any R-input, T-output MIMO system. To illustrate the above-described aspect of the present invention further, a full example for 3×3 {tilde over (H)} will be described below in connection with FIG. 17 and equations (99) through (120).
A Cholesky factorization for a 3×3 combined channel matrix, {tilde over (H)}, is described herein. The components of {tilde over (H)} may be represented by h11, h12, h*12, h13, h*13, h22, h*22, h33, and h*22. Thus, the first matrix, A(1), may be given by,
  
    
  
In accordance with equation (73), variables a(1), b(1), and B(1) may take on the following values:
  
    
  
The first step in the Cholesky recursive algorithm involves determining A(2) and L1 using equations (74) and (75), respectively. Accordingly, A(2) and L1 may be given by,
  
    
  
where h11(2)=h11h22−h*12h12, h12(2)=h11h23−h*12h13, and h22(2)=h11h33−h*13h13.
After determining A(2), the second step in the Cholesky algorithm involves determining A(3) and L2 using equations (74) and (75) once again. First, from equation (73), variables a(2), b(2), and B(2) may take on the following values:
  
    
  
Therefore, A(3) and L2 may be given by,
  
    
  
where h11(3)=h11(2)h22(2)−h12(2)*h12(2).
After determining A(3), the third and final step in the Cholesky algorithm involves calculating A(4) and L3. In accordance with equation (73), A(3) may be written as,
  
    
  
Thus, A(4) and L3 may be expressed as,
  
    
  
As expected at the final step of the Cholesky algorithm, the matrix A(4)=A(R+1), is the identity matrix.
The lower triangular matrix, L, where {tilde over (H)}=LL*, may be determined following the recursive algorithm described above. In general, L is determined by multiplying L1, . . . ,LP. Thus, for the 3×3 case, L may be calculated by multiplying L1, L2, and L3. Thus,
  
    
  
The inverse of L, or L−1, may also be calculated by computing the inverses of L1, L2, L3, and multiplying them in reverse order. That is,
  
    
  
If the combined channel matrix is 3×3, a preprocessor may compute the 3×3 Cholesky algorithm, as described above in connection with equations (99) through (119). Alternatively, a preprocessor may have equations for one or more factorizations, or equivalent representations of the equations, hard-coded or hard-wired. For example, the preprocessor may hard-code or hard-wire equations (114) and (117).
  FIG. 17 shows illustrative flow diagram 1700 for decoding a combined received signal vector from a 3×3 MIMO system in accordance with the decoding strategy of flow diagram 1500 (FIG. 15). At step 1702, processing is performed on components of the combined channel response matrix that will be used to calculate a simplified decoding metric. In particular, processing may be performed to determine the {tilde over (L)} and {circumflex over (L)}−1 matrices, shown in equations (116) and (119). First, h11(2)=h11h22−h*12h12, h12(2)=h11h23−h*12h13, and h22(2)=h11h33−h*13h13, defined in the first step of the Cholesky factorization, may be determined. Using the results, h11(3)=h11(2)h22(2)−h12(2)*h12(2), defined in the second stop of the Cholesky factorization, may be calculated. Also, the square root of h11(2), √{square root over (h11(2))}, may be calculated in parallel. Following the determination of h11(3), the square root of h11(3), √{square root over (h11(3))} may also be calculated. In some embodiments, the square root circuitry (if applicable) used to calculate √{square root over (h11(2))} may be used to calculate √{square root over (h11(3))}. Finally, using the results of the above calculations, the {tilde over (L)} and {circumflex over (L)}−1 matrices may be constructed. Namely, the non-zero components of {tilde over (L)} and {circumflex over (L)}−1 (which are √{square root over (h11(2))}√{square root over (h11(3))}, −h*12√{square root over (h11(3))}, h11√{square root over (h11(3))}, −h*13h11(2)+h*12h12(2)*, −h11h12(2), and h11h11(2)) may be calculated. Note that no division operation is required in any of the above calculations. For at least this reason, the calculations performed in step 1702 may be considerably less complex than any channel processing that would have been necessary using the original decoding metric. In some embodiments, the channel processing calculations described above may be performed by a channel preprocessor (e.g., preprocessor 1400 in FIG. 14).
At step 1704, a combined received signal vector {tilde over (y)}, may be processed by multiplying the vector by {circumflex over (L)}−1, determined from step 1702. The combined received signal vector may be obtained using MRC or any other suitable combining method, such as another form of weighted addition. The combined received signal vector may be obtained from a signal vector combiner, such as MRC combiner 1402 in FIG. 14. The multiplication by {circumflex over (L)}−1 may be performed by a signal processor, such as signal processor 1412 in FIG. 14.
At step 1706, a simplified decoding metric may be calculated for each possible combination of X. For a 3×3 system, the simplified decoding metric may be {tilde over (D)}=∥{circumflex over (L)}−1Y−√{square root over (h11(3))}{tilde over (L)}*X∥2, where h11(3)=h11(2)h22(2)−h12(2)*h12(2) and {tilde over (L)} and {circumflex over (L)}−1 are given by equations (116) and (119), respectively. Thus, at step 1706, √{square root over (h11(3))}{tilde over (L)}* may be multiplied by each valid common transmit signal vector, X, and the result from each multiplication may be used to determine the simplified decoding metric. Alternatively, the decoding metric may be a linear approximation of the simplified decoding metric, {tilde over (D)}linear=∥{circumflex over (L)}−1Y−√{square root over (h11(3))}{tilde over (L)}*X∥. Step 1706 may therefore involve computing a suitable decoding metric many times (e.g., 64×64×64=262,144 times for a 3×3, 64-QAM system, or 64 times for each symbol). Step 1706 may be performed by a maximum-likelihood decoder, such as by ML decoder 1404 in FIG. 14.
After calculating the decoding metric for each possible X, the minimizing values for b=1 and b=0 are used to determine a simplified LLR at step 1708. The simplified LLR may be determined by computing,
  
    
  
or LLR′linear. The simplified LLR may be computed by a maximum-likelihood decoder, such as by ML decoder 1404 in FIG. 14. At step 1712, simplified LLR may be modified by a factor to compute the true LLR. In the 3×3 case, the factor may be
  
    
  
depending on which decoding metric is used. This factor may be determined at step 1710.
Step 1710 may be executed while steps 1704, 1706, and 1708 are being executed. Namely, step 1710 may be computed at any time while steps 1704, 1706 and 1708 are computed. Alternatively, step 1710 may be computed some time before or some time after the other steps. Step 1710 involves performing calculations that are not used by steps 1704, 1706, and 1708, but are used to compute the final LLR value. Thus, step 1710 may perform any suitable calculations that are used in calculations after the critical path (e.g., step 1712). For a 3×3 system, step 1710 may involve computing h11h11(2)h11(3), and using the result to compute
  
    
  
Alternatively, step 1710 may involve computing √{square root over (h11)}, then using the result to compute √{square root over (h11)}√{square root over (h11(2))}√{square root over (h11(3))}, and finally computing
  
    
  
Recall that √{square root over (h11(2))} and √{square root over (h11(3))} have already been computed at step 1602. √{square root over (h11)} may therefore be computed using the same hardware, if applicable, as the hardware used to compute √{square root over (h11(2))} and/or √{square root over (h11(3))}.
  
    
  
may be used by step 1610 to compute the final LLR, as described above. Step 1610 may be computed by a channel processor, such as preprocessor 1400 in FIG. 14.
As previously discussed above in the 2×2 example, the decoding implementation shown above has many advantages. Firstly, the division operation is left out of the critical path, and may be performed at substantially the same time as the critical path calculations. Therefore, the division operation may be implemented using a slow, but low-complexity algorithm, such as a serial inversion mechanism. Furthermore, the square root operations are left out of the critical path, which may again allow a receiver designer to lower the complexity of the square root implementations.
Secondly, if the linear simplified decoding metric is used, the decoding may be symbol-based. That is, the decoder may output estimates of each symbol rather than the entire signal vector. If hard-decisions are used, the simplified LLR determined symbol-by-symbol is sufficient to map each symbol to a hard decision. Thus, the modifier is no longer needed, and steps 1710 and 1712 may be completely omitted. Therefore, division operations are not necessary, nor are any final multipliers to compute the true LLR.
Generally, a decoding metric with Cholesky factorization, ∥L−1{tilde over (y)}N−L*x∥2, for an R×R MIMO system may be factored into a squared, simplified decoding metric, {tilde over (D)}=∥{circumflex over (L)}−1Y−√{square root over (h11(R))}{tilde over (L)}*X∥2, and modifier,
  
    
  
where h11(1)=h11. Alternatively, the decoding metric may be factored into a linear, simplified decoding metric, {tilde over (D)}=∥{circumflex over (L)}−1Y−√{square root over (h11(R))}{tilde over (L)}*X∥, and modifier,
  
    
  
where h11(1)=h11. Derivations of the equations for 2×2 and 3×3 MIMO systems were given above. Decoding of a signal vector for a general R-input, R-output MIMO system may be performed using the steps shown in FIG. 15, and may have any of the features described above in connection with the 2×2 and 3×3 examples of FIG. 16 and FIG. 17.
Referring now to FIGS. 18A-18G, various exemplary implementations of the present invention are shown.
Referring now to FIG. 18A, the present invention can be implemented in a hard disk drive 1800. The present invention may implement either or both signal processing and/or control circuits, which are generally identified in FIG. 18A at 1802. In some implementations, the signal processing and/or control circuit 1802 and/or other circuits (not shown) in the HDD 1800 may process data, perform coding and/or encryption, perform calculations, and/or format data that is output to and/or received from a magnetic storage medium 1806.
The HDD 1800 may communicate with a host device (not shown) such as a computer, mobile computing devices such as personal digital assistants, cellular phones, media or MP3 players and the like, and/or other devices via one or more wired or wireless communication links 1808. The HDD 1800 may be connected to memory 1809 such as random access memory (RAM), low latency nonvolatile memory such as flash memory, read only memory (ROM) and/or other suitable electronic data storage.
Referring now to FIG. 18B, the present invention can be implemented in a digital versatile disc (DVD) drive 1810. The present invention may implement either or both signal processing and/or control circuits, which are generally identified in FIG. 18B at 1812, and/or mass data storage of the DVD drive 1810. The signal processing and/or control circuit 1812 and/or other circuits (not shown) in the DVD 1810 may process data, perform coding and/or encryption, perform calculations, and/or format data that is read from and/or data written to an optical storage medium 1816. In some implementations, the signal processing and/or control circuit 1812 and/or other circuits (not shown) in the DVD 1810 can also perform other functions such as encoding and/or decoding and/or any other signal processing functions associated with a DVD drive.
The DVD drive 1810 may communicate with an output device (not shown) such as a computer, television or other device via one or more wired or wireless communication links 1817. The DVD 1810 may communicate with mass data storage 1818 that stores data in a nonvolatile manner. The mass data storage 1818 may include a hard disk drive (HDD). The HDD may have the configuration shown in FIG. 18A. The HDD may be a mini HDD that includes one or more platters having a diameter that is smaller than approximately 1.8″. The DVD 1810 may be connected to memory 1819 such as RAM, ROM, low latency nonvolatile memory such as flash memory and/or other suitable electronic data storage.
Referring now to FIG. 18C, the present invention can be implemented in a high definition television (HDTV) 1820. The present invention may implement either or both signal processing and/or control circuits, which are generally identified in FIG. 18C at 1822, a WLAN interface and/or mass data storage of the HDTV 1820. The HDTV 1820 receives HDTV input signals in either a wired or wireless format and generates HDTV output signals for a display 1826. In some implementations, signal processing circuit and/or control circuit 1822 and/or other circuits (not shown) of the HDTV 1820 may process data, perform coding and/or encryption, perform calculations, format data and/or perform any other type of HDTV processing that may be required.
The HDTV 1820 may communicate with mass data storage 1827 that stores data in a nonvolatile manner such as optical and/or magnetic storage devices for example hard disk drives HDD and/or DVDs. At least one HDD may have the configuration shown in FIG. 18A and/or at least one DVD may have the configuration shown in FIG. 18B. The HDD may be a mini HDD that includes one or more platters having a diameter that is smaller than approximately 1.8″. The HDTV 1820 may be connected to memory 1828 such as RAM, ROM, low latency nonvolatile memory such as flash memory and/or other suitable electronic data storage. The HDTV 1820 also may support connections with a WLAN via a WLAN network interface 1829.
Referring now to FIG. 18D, the present invention implements a control system of a vehicle 1830, a WLAN interface and/or mass data storage of the vehicle control system. In some implementations, the present invention may implement a powertrain control system 1832 that receives inputs from one or more sensors such as temperature sensors, pressure sensors, rotational sensors, airflow sensors and/or any other suitable sensors and/or that generates one or more output control signals such as engine operating parameters, transmission operating parameters, and/or other control signals.
The present invention may also be implemented in other control systems 1840 of the vehicle 1830. The control system 1840 may likewise receive signals from input sensors 1842 and/or output control signals to one or more output devices 1844. In some implementations, the control system 1840 may be part of an anti-lock braking system (ABS), a navigation system, a telematics system, a vehicle telematics system, a lane departure system, an adaptive cruise control system, a vehicle entertainment system such as a stereo, DVD, compact disc and the like. Still other implementations are contemplated.
The powertrain control system 1832 may communicate with mass data storage 1846 that stores data in a nonvolatile manner. The mass data storage 1046 may include optical and/or magnetic storage devices for example hard disk drives HDD and/or DVDs. At least one HDD may have the configuration shown in FIG. 18A and/or at least one DVD may have the configuration shown in FIG. 18B. The HDD may be a mini HDD that includes one or more platters having a diameter that is smaller than approximately 1.8″. The powertrain control system 1832 may be connected to memory 1847 such as RAM, ROM, low latency nonvolatile memory such as flash memory and/or other suitable electronic data storage. The powertrain control system 1832 also may support connections with a WLAN via a WLAN network interface 1848. The control system 1840 may also include mass data storage, memory and/or a WLAN interface (all not shown).
Referring now to FIG. 18E, the present invention can be implemented in a cellular phone 1850 that may include a cellular antenna 1851. The present invention may implement either or both signal processing and/or control circuits, which are generally identified in FIG. 18E at 1852, a WLAN interface and/or mass data storage of the cellular phone 1850. In some implementations, the cellular phone 1850 includes a microphone 1856, an audio output 1858 such as a speaker and/or audio output jack, a display 1860 and/or an input device 1862 such as a keypad, pointing device, voice actuation and/or other input device. The signal processing and/or control circuits 1852 and/or other circuits (not shown) in the cellular phone 1850 may process data, perform coding and/or encryption, perform calculations, format data and/or perform other cellular phone functions.
The cellular phone 1850 may communicate with mass data storage 1864 that stores data in a nonvolatile manner such as optical and/or magnetic storage devices for example hard disk drives HDD and/or DVDs. At least one HDD may have the configuration shown in FIG. 18A and/or at least one DVD may have the configuration shown in FIG. 18B. The HDD may be a mini HDD that includes one or more platters having a diameter that is smaller than approximately 1.8″. The cellular phone 1850 may be connected to memory 1866 such as RAM, ROM, low latency nonvolatile memory such as flash memory and/or other suitable electronic data storage. The cellular phone 1850 also may support connections with a WLAN via a WLAN network interface 1868.
Referring now to FIG. 18F, the present invention can be implemented in a set top box 1880. The present invention may implement either or both signal processing and/or control circuits, which are generally identified in FIG. 18F at 1884, a WLAN interface and/or mass data storage of the set top box 1880. The set top box 1880 receives signals from a source such as a broadband source and outputs standard and/or high definition audio/video signals suitable for a display 1888 such as a television and/or monitor and/or other video and/or audio output devices. The signal processing and/or control circuits 1884 and/or other circuits (not shown) of the set top box 1880 may process data, perform coding and/or encryption, perform calculations, format data and/or perform any other set top box function.
The set top box 1880 may communicate with mass data storage 1890 that stores data in a nonvolatile manner. The mass data storage 1890 may include optical and/or magnetic storage devices for example hard disk drives HDD and/or DVDS. At least one HDD may have the configuration shown in FIG. 18A and/or at least one DVD may have the configuration shown in FIG. 18B. The HDD may be a mini HDD that includes one or more platters having a diameter that is smaller than approximately 1.8″. The set top box 1880 may be connected to memory 1894 such as RAM, ROM, low latency nonvolatile memory such as flash memory and/or other suitable electronic data storage. The set top box 1880 also may support connections with a WLAN via a WLAN network interface 1896.
Referring now to FIG. 18G, the present invention can be implemented in a media player 1900. The present invention may implement either or both signal processing and/or control circuits, which are generally identified in FIG. 18G at 1904, a WLAN interface and/or mass data storage of the media player 1900. In some implementations, the media player 1900 includes a display 1907 and/or a user input 1908 such as a keypad, touchpad and the like. In some implementations, the media player 1900 may employ a graphical user interface (GUI) that typically employs menus, drop down menus, icons and/or a point-and-click interface via the display 1907 and/or user input 1908. The media player 1900 further includes an audio output 1909 such as a speaker and/or audio output jack. The signal processing and/or control circuits 1904 and/or other circuits (not shown) of the media player 1900 may process data, perform coding and/or encryption, perform calculations, format data and/or perform any other media player function.
The media player 1900 may communicate with mass data storage 1910 that stores data such as compressed audio and/or video content in a nonvolatile manner. In some implementations, the compressed audio files include files that are compliant with MP3 format or other suitable compressed audio and/or video formats. The mass data storage may include optical and/or magnetic storage devices for example hard disk drives HDD and/or DVDs. At least one HDD may have the configuration shown in FIG. 18A and/or at least one DVD may have the configuration shown in FIG. 18B. The HDD may be a mini HDD that includes one or more platters having a diameter that is smaller than approximately 1.8″. The media player 1900 may be connected to memory 1914 such as RAM, ROM, low latency nonvolatile memory such as flash memory and/or other suitable electronic data storage. The media player 1900 also may support connections with a WLAN via a WLAN network interface 1916. Still other implementations in addition to those described above are contemplated.
The foregoing describes systems and methods for decoding a signal vector, where the receiver may obtain receive multiple instances of the same transmit signal vector. The above described embodiments of the present invention are presented for the purposes of illustration and not of limitation. Furthermore, the present invention is not limited to a particular implementation. The invention may be implemented in hardware, such as on an application specific integrated circuit (ASIC) or on a field-programmable gate array (FPGA). The invention may also be implement in software.