The present application claims benefit from European Patent Application 13195548.6, filed Dec. 3, 2013, entitled “SYSTEM AND METHOD FOR ACCELERATING A MAXIMUM LIKELIHOOD DECODER IN A MIMO SYSTEM,” incorporated by reference herein in its entirety.
The present invention relates generally to the field of radio frequency (RF) multiple-input-multiple-output (MIMO) systems and more particularly to decoding spatial multiplexed signals in MIMO systems.
An N×M MIMO system 100 is shown in
The channel matrix H includes entries hij that represent the relationship between the signals transmitted from the jth transmitter antenna 102 to the signal received by the ith receiver antenna 104. The dimension of the transmit vector
A decoder 106, e.g., a maximum likelihood (ML) decoder, may decode a received signal
A demultiplexer 108 may modulate transmitted signals
Due to the enormous rate at which information may be decoded, there is a great need in the art for providing a system and method that accelerates decoding signals in MIMO systems.
There is now provided according to embodiments of the invention an improved system and method that accelerates a maximum likelihood decoder in MIMO systems for effectively overcoming the aforementioned difficulties inherent in the art. Embodiments of the invention may provide near or full maximum likelihood performance at significantly faster speeds and/or with increased search complexity.
A decoder in a MIMO system to search a tree graph for a transmit signal
The decoder may include a first module to iteratively predict a branch decision at each node along a decision path, wherein in each sequential cycle, the decoder may advance to select a new predicted node. The decoder may include a second module separate from the first module and operating in parallel to the first module. The second module may determine if the predicted branch selected by the first module in a previous clock cycle is correct or incorrect. If the branch decision is correct, the second module does not interrupt the first module and if the branch decision is incorrect, the second module may interrupt the first module to backtrack one cycle and select an alternative predicted node.
The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings in which:
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.
In the following description, various aspects of the present invention will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the present invention. However, it will also be apparent to one skilled in the art that the present invention may be practiced without the specific details presented herein. Furthermore, well known features may be omitted or simplified in order not to obscure the present invention.
Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical, such as electronic, quantities within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices
A maximum likelihood decoder (e.g. decoder 106 of
A tree graph 200 is shown in
The maximum likelihood decoder may search tree graph 200 to determine the most likely solution e.g., a node 204 representing one element in a transmit vector
QR decomposition may simplify the search distance computation by enabling incremental searches on the tree as opposed to evaluating the transmit vector
Using QR decomposition, the channel matrix H is decomposed into matrices Q and R, such that: =QN×MRM×N. Matrix Q is unitary such that: Q−1=QH(QHQ=I) and matrix R is an upper triangular matrix (e.g. having real entries along its main diagonal):
From equation (1),
where ñ has the same statistics (e.g. the same covariance matrix E[
Since matrix R is an upper triangular matrix, computing the squared distance (∥
To limit the number of candidate nodes and their distances to compute, a “sphere” decoder may be used, which searches tree graph 200 for a subset of nodes 204 that have an accumulated distance within a sphere of radius r centered at the received vector
The accumulated distance over the full tree search path may be, for example:
The accumulated distance may be calculated in a recursive manner for a j-th level path (from a root node to a j-th level node) based on sequential PEDi's for i=0, . . . j, for example, solving each row of the matrix R, from the last row of R upwards to the first row of R as follows:
An ith-level node may have an accumulated distance from the root node to the branch node equal to a partial Euclidean distance (PEDi). The distance increment from that ith-level node to the next i+1-level node is DIi. Partial Euclidean distance, PEDi, and distance increment, DIi may be computed, for example, as:
where pyii is a function of ith-level measurement of the current level and other levels i=0, . . . , M−1. This internal state variable, pyii, may be generated in a recursive manner, for example, as described in reference to PEDi. The accumulated distance for each sequential branch decision PEDi−1 may be the sum of the previous level i accumulated distance PEDi and the distance increment DIi from the branch node to the selected node of the next level i−1, for example, defined as:
PEDi−1(ŝi−1,ŝi,ŝM-1)=PEDi(ŝi,ŝi+1,ŝM-1)+DIi
Accordingly, a node 204 in a jth tree level 202 may have an accumulated distance, for example, equal to PED≡∥
The sum of the two terms, PEDi and DIi, may be computed directly (e.g. exactly) or may be approximated to reduce the complexity of the computing hardware and speed up the computation, for example, by using L1, L2 and/or L∞ vector norm recursive calculations as are known in the art. Example solutions for summing terms PEDi and DIi include:
L2 computations using least square approximations may be implemented with shift and addition hardware components and therefore may be relatively simple (e.g. using fewer and less specialized hardware components) as compared to other methods.
A search path including a node 204 may be determined to be “inside” a sphere of radius r when the accumulated distance d2(
A conventional branch decision may determine whether or not the decoder selects or prunes a node 204 by (1) calculating the accumulated distance of its search path through the tree and (2) comparing the accumulated distance to the search radius. In order to choose the correct node 204 in tree graph 200, conventional decoders execute both steps (1) and (2), in sequence, using for example 2 clock cycles per node 204 or one cycle at half the cycle frequency (e.g. twice the period of time per cycle). Since conventional systems cannot proceed to a subsequent (i+1-level) node 204 without validating the choice of a previous (i-level) node 204 (every branch decision is dependent on a previous level branch decision), conventional decoders must search a tree graph sequentially, thereby limiting the speed of the decoder.
In accordance with embodiments of the invention, there is provided a decoder designed to mitigate the inherent sequential nature of conventional decoders to parallelize the branch decision steps (1) and the confirmation step (2) to confirm that decision by comparing the accumulated distance with the radius. In accordance with such embodiments, a decoder may execute a branch decision by predicting or “guessing” the correct node without waiting for the confirmation step (2), thereby using only a single clock cycle. In the branch decision step (1), the decoder measure only the distance increment, DIi (e.g. the “local” distance) of each of the plurality of P candidate nodes between the P branches and the received vector
By separating steps (1) and (2) into parallel processes, the branch decision process (1) may proceed along the tree graph making one-cycle branch decisions (instead of the conventional two-cycle decisions). Note, the confirmation process is typically one cycle behind the branch decision process since step (2) uses the selected smallest distance increment node output from step (1) to determine its accuracy (the branch decision process may select an ith-level node while the confirmation process evaluates the previous i−1-level node).
If the confirmation process determines that the branch decision prediction is correct (e.g. the accumulated partial distance of the tree path to the selected node is less than the tree radius), the branch decision process proceeds uninterrupted in only a single cycle, thereby saving a clock cycle and doubling the search decoder speed as compared to conventional two-cycle branch decisions. However, if the branch decision guess is false (e.g. the accumulated partial distance is greater than the tree radius), the decoder may discard the result and start over by selecting the next smallest ordered node, thereby incurring a clock cycle penalty.
In some embodiments of the invention, branch decisions predictions based only on local distances increments—step (1)—may be used to search tree graphs that are “biased” or that satisfy a bias condition e.g., where one of the P candidate nodes for a branch decision is statistically more likely or probable to be correct than the others. In one embodiment, a bias condition may occur when there is a priori knowledge predefining a branch preference or bias of one node over another. For example, in communications, transmissions include forward error correction bits defining certain rules for the data. Since it is more likely that the transmitted data is correct than false, the decoder may proceed to pick a node that satisfies the rules of the forward error correction bits. In one embodiment, the decoder may test for the presence of bias conditions in real-time and based thereon may activate or deactivate parallelized predicted branch decisions according to embodiments of the invention. The more biased a tree graph or branch, the more likely its outcome will be known and the greater the rate of success and decoder speed. For example,
Reference is made to
The decoder may initialize a decoding process at “root” node 301 (level zero) and may iteratively execute a branch decision prediction at each shaded node 304 along a branch decision path. In each sequential cycle, the decoder may advance to execute a branch decision prediction at a new node 304 in a subsequent level 302. For each branch decision, decoder may select a node based only on the minimum distance increment DIi (e.g. local distance). In parallel (at a one cycle time delay), a validation or confirmation process may determine if the branch decision predicting the node 304 made in a previous cycle is correct, e.g., if the branch has a path with an overall accumulated distance PEDi (e.g. global distance) smaller than the search radius r. When the branch decision is correct, the decoder is not interrupted, enabling the decoder to make branch decisions twice as fast as conventional decoders. However, when the branch decision is incorrect, e.g., if PEDi is greater than or equal to r, the decoder may be interrupted and may return to a previous level to select an alternative candidate node, thereby losing a cycle.
In an initial 1st clock cycle, the decoder may execute a level (1) branch decision starting at root node 301 by measuring the distance increment of each of the plurality of candidate nodes 306-312 in level (1) from root node 301, ordering candidate nodes 306-312 according to their associated distances from the smallest to largest distance increments, DIi, and selecting the node 306 associated with the smallest distance increment DIi.
In the next 2nd clock cycle, the decoder may execute a level (2) branch decision by measuring the distance increment of each of the plurality of candidate level (2) nodes 314-320 stemming from the level (1) node 306 selected in the previous cycle, ordering candidate nodes 314-320 from smallest to largest distance increment, and selecting the node 314 associated with the smallest distance increment.
In the next 3rd clock cycle, the decoder may execute a level (3) branch decision by measuring the distance increment of each level (3) candidate node 322-328 stemming from selected level (2) node 314, ordering level (3) candidate nodes 322-328 from smallest to largest distance increment, and selecting the node 326 with the smallest distance increment.
In the next 4th clock cycle, the decoder may execute a level (4) branch decision by measuring the distance increment of each level (4) candidate node 330-336 stemming from selected level (3) node 326, ordering level (4) candidate nodes 330-336 from smallest to largest distance increment, and selecting the node 336 with the smallest distance increment. Node 336 may be referred to as a “leaf” node since it is in the final level (4). This accumulated distance from root node 301 to the first selected leaf (level (4)) node 336 may be set as the initial search radius r value of the sphere decoder.
After the search radius r is initialized, the decoder may continue searching tree graph 300 by concurrently or in parallel executing an ith level branch decision process and an i−1th level confirmation process. The decoder may only return to nodes in a path having an accumulated distance less than the search radius. Nodes in a path having an accumulated distance greater than or equal to the search radius r may be pruned. (Nodes marked in
In the next 5th clock cycle, the decoder may return to node 316 having the next smallest distance increment of the candidate nodes 314-320. The decoder may execute in parallel a level (3) branch decision to evaluate candidate nodes 338-344 stemming from node 316. For the node 316 decision, the decoder may select node 340 as having the smallest distance increment. For the confirmation process, the decoder may compare the accumulated distance of the path of node 316 (e.g. from root node 301 to candidate node 316)) to the radius r (e.g. from node branch 301 to leaf node 336). In the example shown in the figure, the node 316 decision is confirmed to be correct and both parallel processes may proceed to the next level.
In the next 6th clock cycle, the decoder may execute in parallel a level (4) branch decision to select node 346 and a level (3) confirmation process to confirm that node 340 is correct.
In the next 7th clock cycle, the decoder may confirm that the accumulated distance of the path of node 346 is smaller than the radius r and update the radius r to be the smaller accumulated distance. In parallel, the decoder may proceed to the next available highest-level node 308 to select level (2) candidate node 348 (The decoder may skip nodes 318 and 320, which are pruned since each of their accumulated distances are greater than the search radius).
In the next 8th clock cycle, the decoder may, in parallel, confirm that level (2) node 348 is correct and select a level (3) node 350.
In the next 9th clock cycle, the decoder may, in parallel, confirm that level (3) node 350 is correct and select a level (3) node 352.
In the next 10th clock cycle, the decoder may confirm that level (4) node 352 is correct and update the radius r to be the accumulated distance of the path thereof (from root node 301 to leaf node 352). In parallel, the decoder may proceed to a next available level (3) node 354 (having the second smallest distance increment among nodes 350-358) and select node 356 with the smallest distance increment among candidate nodes 356-364.
In the next 11th clock cycle, the decoder may confirm that the accumulated path distance of the level (4) leaf node 356 is smaller than the radius r and may update the radius r to be the accumulated distance. (No parallel branch decision is made in this final clock cycle since all branches in tree graph 300 have been traversed.)
It may be appreciates that although the discussion in reference to
Reference is made to
In
In
Reference is made to
Decoder 500 may include two or more separate independently operated processes executed by a branch decision module 501 and a confirmation module 502. Both modules 501 and 502 may run in parallel, for example, with module 502 running at a time delay of, for example, one (1) cycle, from module 501.
Branch decision module 501 may include modules 506-518 operating in a single cycle to select the most probable (smallest local distance increment) one of a plurality of P candidate nodes stemming from the same parent node and module 502 may include modules 520-522 to compute the accumulated or global distance of the path of the node selected by module 501 and determine if it is correct (smaller than a search radius).
A buffer 504 may store an internal state variable pyii and a buffer 506 may store internal state variable Rii for an ith level branch decision. This state variable is calculated recursively to simplify the current level PEDi (partial Euclidean distance) calculation.
A module 508 may compute a plurality of P candidate transmit signal vector
A module 516 may calculate the product signals Rii
A module 514 may update a branch decision to select the candidate node with the smallest distance increment and proceed to the next tree level based on that branch decision. Module 514 may proceed either down the tree graph (if the current node is in a level 1, . . . , M−1) or up the tree graph (if the current node is a leaf node in the last level M).
After modules 508-518 make a branch decision for a current ith level node (in one clock cycle), module 501 may increment index i to the index of the next tree level and reset modules 508-518 to iterate their operations for the next level branch decision.
In parallel to module 501 executing an iteration of a branch decision, confirmation module 502 may evaluate if a previous iteration of a branch decision is correct.
A module 520 may determine a metric approximation of the accumulated distance, e.g., PED≡∥
According to embodiments in which decoder 500 may activate and deactivate parallelized branch decisions, decoder 500 may switch between parallelized processes by executing modules 501 and 502 in parallel and non-parallelized processes by executing modules 501 and 502 in sequence.
Modules 501, 502 and 508-522 may be software-implemented using dedicated instruction(s) or, alternatively, hardware-implemented using designated hardware or circuitry, such as, logic arrays (e.g. shift, addition, arithmetic logic units (ALUs)), controllers and/or processors. Memory 503 may store resources (e.g. channel matrix H, upper triangular matrix R, current or past values for Rii, pyii, the sphere radius r, etc.) and instructions for executing processes according to embodiment of the invention.
Reference is made to
In operation 610, a first module (e.g. branch prediction module 501 of
In operation 620, in parallel to operation 610, a second (e.g. confirmation module 502 of
In operation 630, the second module may compute the global or accumulated distance of the selected node (e.g. by adding the distance increment for the node in each level of a search path from a root node to the selected candidate node), for example, by adding the current level's distance increment (e.g., the difference between the received signal and the transmitted signal corresponding to the selected node) to the previous level's accumulated distance (e.g. the distance increments accumulated for each node in a search path level-by-level from the root node to the parent node at the branch decision juncture).
In operation 640, the second module may compare the accumulated distance of the path of the selected node to a search radius, r.
If the accumulated distance is less than the search radius, the branch decision is correct and a process or processor may proceed to the next level to repeat operation 610 at the next branch node along the selected path. If the branch node is a leaf (final-level) node, the search radius may be updated to be the accumulated distance of the selected node.
However, if the accumulated distance is greater or equal than the search radius, the branch decision is incorrect and a process or processor may proceed to operation 650.
In operation 650, the first module may override the branch prediction, for example, pruning the selected node, and select an alternative candidate node with the next smallest local distance increment. A process or processor may backtrack to operation 620 to evaluate the alternative candidate node.
The process may proceed iteratively until a final node is evaluated.
Reference is made to
To compute the performance gain for such mean penalty ratios, each incorrect prediction event may incur a one cycle penalty, while the operating frequency of the search is doubled. The following parameters may be defined, for example, as follows:
Considering the false prediction ratios for
For the low-medium SNR range data represented in
For the high SNR range data represented in
Reference is made to
In accordance with any of the aforementioned embodiments of the invention, a branch tree decision may be divided into two phases, executed in parallel, to search a branch tree with up to double the speed of conventional systems that execute these phases in sequence.
Conventional sphere decoders may traverse a tree graph in search of the ML solution for a given data tone. Each chosen node lies along a path in the tree that has an accumulated distance that is compared against the tree radius. If the path distance exceeds the tree radius, the node may be pruned (discarded). The calculation of the accumulated distance and comparison to the tree radius typically make up the critical path of the conventional design, thereby limiting the speed of the ML decoder engine.
However, in biased trees where one node is statistically preferable over others, it is possible to choose a node by assuming that it is correct. The choice may later be checked and fixed if it turns out to be incorrect. The ML decoder engine operating according to the aforementioned embodiments of the invention may be executed in two parallel phases:
Phase 1: An optimal candidate node (contributing a minimal distance increment from the received signal) may be chosen and its path may be assumed to be smaller than the tree radius.
Phase 2: The node's accumulated distance may be calculated and compared one phase/cycle after the node is chosen:
For a correct prediction: In case the accumulated distance is smaller than the tree radius, the tree search continues.
For a false prediction: In case the accumulated distance is larger than the tree radius, a penalty cycle is incurred and a new next-smallest candidate node is chosen.
This implementation may double the clock speed of the ML decoder engine.
The performance improvement may be proportional to the tree bias, for example, depending on the number of correct vs. false predictions.
In accordance with the present invention and as used herein, the following terms are defined with the following meanings, unless explicitly stated otherwise.
Executed “in parallel” as used herein refers to executed concurrently, simultaneously, or during completely or partially overlapping time intervals. For example, a branch decision process may run in parallel (in overlapping time intervals) to a confirmation process even though the confirmation process is generally run at a time delay of one cycle relative to the branch decision process. For example, since the branch decision to select an i+1-level node occurs during the time when the confirmation process evaluates a prior i-level node, in parallel may refer to the simultaneously execution of the branch decision and confirmation processes in general, and/or in parallel may refer to processing a specific node at less than or equal to a one cycle delay. Further, in some embodiments, M cycles may be used to generate the initial search radius for an M-level tree. Accordingly, the confirmation process cannot begin until the search radius is initiated, causing a delay of M cycles for confirmation process relative to the branch prediction process. In such embodiments, in parallel may include staggered processing in which only the branch decision process runs for the first M cycles, after which the confirmation process initiates its processing at the same or overlapping times. Similarly, the final cycle of the tree search may include only a confirmation process and not a branch prediction process since the branch prediction process evaluates its final node one clock cycle before the confirmation process.
A “parent” or “child” node or level are relative terms as used herein that refer to a previous (i−1 th) and subsequent (i+1th) node or level, respectively, relative to the current (ith) node or level.
The maximum likelihood decoder (e.g., decoder 500 of
In accordance with any of the aforementioned embodiments of the invention, systems and methods may be software-implemented using dedicated instruction(s) or, alternatively, hardware-implemented using designated circuitry and/or logic arrays.
In accordance with any of the aforementioned embodiments of the invention, systems and methods may be executed using an article such as a computer or processor readable non-transitory storage medium, or a computer or processor storage medium, such as for example a memory (e.g. memory 503 of
Different embodiments are disclosed herein. Features of certain embodiments may be combined with features of other embodiments; thus certain embodiments may be combinations of features of multiple embodiments.
Although the particular embodiments shown and described above will prove to be useful for the many distribution systems to which the present invention pertains, further modifications of the present invention will occur to persons skilled in the art. All such modifications are deemed to be within the scope and spirit of the present invention as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
13195548 | Dec 2013 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
8005170 | Lee et al. | Aug 2011 | B2 |
8018828 | Song et al. | Sep 2011 | B2 |
8279977 | Rayala | Oct 2012 | B2 |
20050135498 | Yee | Jun 2005 | A1 |
20090034664 | Masui et al. | Feb 2009 | A1 |
20100014606 | Chen et al. | Jan 2010 | A1 |
20110044407 | Chang et al. | Feb 2011 | A1 |
20120269303 | Paker et al. | Oct 2012 | A1 |
Number | Date | Country |
---|---|---|
2341676 | Jul 2011 | EP |
WO 2009109394 | Sep 2009 | WO |
WO 2010005606 | Jan 2010 | WO |
WO 2010112506 | Oct 2010 | WO |
Entry |
---|
Burg et al. US “VLSI implementation of MUMO detection using sphere decoding algorithm”;Jul. 2005, IEEE journal of solid state circuit, pp. 1566-1577. |
Blocskei H et al. “VLSI Implementation of MIMO Detection Using The Sphere Decoding Algorithm” IEEE Journal of Solid-State Circuits, IEEE Service Center, Piscataway, NJ, USA vol. 40, No. 7, Jul. 1, 2005; pp. 1566-1577. |
Azzam et al., “Reduction of ML Decoding Complexity for MIMO Sphere Decoding, QOSTBC, and OSTBC”, pp. 18-25, DOI: 10.1109/ITA.2008.4601014 “IEEE”, In proceeding of: Information Theory and Applications Workshop, Feb. 2008. |
Kozlowski et al., “Phased array antennas in MIMO receiver”, Journal of Telecommunications and Information Technology, Jan. 2007, pp. 26-29. |
3GPP Organizational Partners, “3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Channels and Modulation (Release 8)” May 2009, pp. 1-83. |
Number | Date | Country | |
---|---|---|---|
20150155971 A1 | Jun 2015 | US |