Finite-alphabet beamforming for multi-antenna wideband systems

Information

  • Patent Grant
  • 11115095
  • Patent Number
    11,115,095
  • Date Filed
    Thursday, March 5, 2020
    4 years ago
  • Date Issued
    Tuesday, September 7, 2021
    3 years ago
Abstract
Finite-alphabet beamforming for multi-antenna wideband systems is provided. The combination of massive multi-user multiple-input multiple-output (MU-MIMO) technology and millimeter-wave (mmWave) communication enables unprecedentedly high data rates for radio frequency (RF) communications. In such systems, beamforming must be performed at extremely high rates over hundreds of antennas. For example, spatial equalization applies beamforming in the uplink to mitigate interference among user equipment (UEs) at a base station (BS). Finite-alphabet equalization provides a new paradigm that restricts the entries of a spatial equalization matrix to low-resolution numbers, enabling high-throughput, low-power, and low-cost equalization hardware. Similarly, precoding applies beamforming in the downlink to maximize the reception of a signal transmitted from a BS to a target UE. Finite-alphabet precoding can be applied in the downlink to similarly improve power and cost in precoding hardware.
Description
FIELD OF THE DISCLOSURE

The present disclosure relates to equalization of radio frequency (RF) signals in multi-antenna RF systems and devices.


BACKGROUND

Future wireless systems are expected to deliver higher data-rates within the already crowded radio frequency (RF) spectrum. Emerging technologies, such as millimeter-wave (mmWave) or terahertz (THz) communication and massive multi-user multiple-input multiple-output (MU-MIMO), have risen as promising candidates to provide such high data rates. An abundance of available bandwidth at mmWave and THz frequencies, combined with fine-grained beamforming capabilities provided by massive MU-MIMO, can enable high-throughput communication to multiple user equipments (UEs) in a same time-frequency resource. However, these technologies require hundreds of antennas at a base station (BS), each receiving or transmitting wideband signals, which necessitates sophisticated RF and digital baseband processing circuitry. As a result, circuit power consumption and system costs may increase significantly, which may hamper the success of this technology.


To reduce power consumption, the literature has largely focused on multi-antenna mmWave and THz architectures that rely on hybrid analog-digital solutions. Albeit energy efficient, such architectures have limited multiplexing capabilities as they are only capable of simultaneously combining signals coming from a restricted number of directions; this key limitation may result in a reduced spectral efficiency. An emerging alternative is the use of all-digital BS architectures. While it is commonly believed that all-digital BS designs would be energy inefficient, it has been shown recently that the power consumption of the RF and data-conversion elements in an all-digital BS is comparable to that of hybrid solutions, provided that the resolution of the data converters at the BS is suitably reduced. However, power consumption and system costs of baseband processing for all-digital BS architectures are largely unexplored.



FIG. 1 is a schematic diagram of a traditional MU-MIMO wireless communication system 10 including multiple UEs 12 and a BS 14. Spatial equalization in the uplink (UEs 12 transmit to BS 14) is among the most power- and throughput-critical tasks in all-digital BS architectures. The purpose of spatial equalization is to collect the signals from all U UEs 12 at B BS antennas 16, while suppressing inter-UE interference. Mathematically, spatial equalization amounts to one or multiple U×B matrix-vector multiplications involving a U×B equalization matrix and a B-dimensional received vector. These multiplications need to be performed on a per-baseband-sample basis (e.g., at the sample rate of analog-to-digital converters in the BS 14).


Unfortunately, for a BS 14 with a large array of antennas 16 serving multiple UEs 12, a conventional matrix-vector-product circuit operating consumes a large amount of power. For example, a BS 14 with B=256 antennas 16 serving U=16 UEs 12 which uses a conventional processing circuit at 2 G vectors/second consumes over 28 watts (W) and occupies more than 128 square millimeters (mm2) when implemented in a 28 nanometer (nm) complementary metal-oxide-semiconductor (CMOS). If more BS antennas 16 and/or more UEs 12 are considered, circuit power and area increase even further.


SUMMARY

Finite-alphabet beamforming for multi-antenna wideband systems is provided. The combination of massive multi-user multiple-input multiple-output (MU-MIMO) technology and millimeter-wave (mmWave) communication enables unprecedentedly high data rates for radio frequency (RF) communications. In such systems, beamforming must be performed at extremely high rates over hundreds of antennas. For example, spatial equalization applies beamforming in the uplink to mitigate interference among user equipment (UEs) at a base station (BS). Conventional hardware designs of spatial equalizers in all-digital BSs, where each antenna is equipped with a pair of data converters, would entail prohibitively high power consumption and implementation costs. To address these issues, finite-alphabet equalization provides a new paradigm that restricts the entries of a spatial equalization matrix to low-resolution numbers, enabling high-throughput, low-power, and low-cost equalization hardware.


Similarly, precoding applies beamforming in the downlink to maximize the reception of a signal transmitted from a BS to a target UE. Finite-alphabet precoding can be applied in the downlink to similarly improve power and cost in precoding hardware.


To minimize the performance loss of finite-alphabet equalization, embodiments generate an equalization matrix using finite-alphabet minimum mean-square error (MMSE) equalization (FAME), which significantly outperforms a nave quantization of a linear MMSE matrix. A similar approach is applied to replace part of linear Wiener filter (WF) precoding matrices with a finite-alphabet WF precoding (FAWP) matrix for the downlink. Efficient algorithms can be deployed to quantize solutions to a non-deterministic polynomial-time (NP)-hard FAME/FAWP problem defined herein. Through this, it is shown that for massive MU-MIMO mmWave systems, near-optimal error-rate performance can be achieved with equalization coefficients quantized to only 1-3 bits. In addition, very-large scale integration (VLSI) results demonstrate a reduction in equalization power and area by at least a factor of 3.9× and 5.8×, respectively, over traditional approaches.


An exemplary embodiment provides a method for digitally beamforming signals for an antenna array. The method includes estimating a wireless channel associated with a plurality of digital baseband signals and an antenna array to produce estimates for the wireless channel. The method further includes beamforming, with a finite-alphabet equalizer, the plurality of digital baseband signals based on the estimates of the wireless channel.


Another exemplary embodiment provides a radio access node. The radio access node includes an antenna array and channel estimator circuitry coupled to the antenna array and configured to provide an estimate of a wireless channel associated with a plurality of digital baseband signals for the antenna array. The radio access node further includes finite-alphabet beamforming circuitry coupled to the antenna array and the channel estimator circuitry, the finite-alphabet beamforming circuitry being configured to beamform the plurality of digital baseband signals in accordance with the estimation of the wireless channel.


Those skilled in the art will appreciate the scope of the present disclosure and realize additional aspects thereof after reading the following detailed description of the preferred embodiments in association with the accompanying drawing figures.





BRIEF DESCRIPTION OF THE DRAWING FIGURES

The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the disclosure, and together with the description serve to explain the principles of the disclosure.



FIG. 1 is a schematic diagram of a traditional multi-user multiple-input multiple-output (MU-MIMO) wireless communication system including multiple user equipments (UEs) and a base station (BS).



FIG. 2 is a schematic diagram of a wireless communication system with a radio access node having finite-alphabet beamforming circuitry.



FIG. 3 is a graphical representation of uncoded bit error-rate (BER) for spatial equalization approaches, including the finite-alphabet equalizer of FIG. 2.



FIG. 4A is a graphical representation of error-vector magnitude (EVM) performance for a linear minimum mean-square error (L-MMSE) equalizer.



FIG. 4B is a graphical representation of EVM performance for a nave finite-alphabet L-MMSE (FL-MMSE) equalizer.



FIG. 4C is a graphical representation of EVM performance for a finite-alphabet MMSE equalization (FAME) using an exhaustive search FAME-EXH embodiment of the finite-alphabet equalizer of FIG. 2.



FIG. 5A is a graphical representation of beam- and null-forming capabilities for the L-MMSE equalizer.



FIG. 5B is a graphical representation of beam- and null-forming capabilities for the FL-MMSE equalizer.



FIG. 5C is a graphical representation of beam- and null-forming capabilities for the FAME-EXH embodiment of the finite-alphabet equalizer of FIG. 2.



FIG. 6A is a graphical representation of uncoded BER for an 8 antenna, 2 UE scenario using quadrature phase-shift keying (QPSK) in an independent and identically distributed (i.i.d.) Rayleigh-fading scenario under various equalization approaches.



FIG. 6B is a graphical representation of uncoded BER for a 64 antenna, 4 UE scenario using 16-quadrature amplitude modulation (QAM) in an i.i.d. Rayleigh-fading scenario under various equalization approaches.



FIG. 7A is a graphical representation of uncoded BER for a 256 antenna, 16 UE scenario using 16-QAM and an i.i.d. Rayleigh-fading channel model under various equalization approaches.



FIG. 7B is a graphical representation of uncoded BER for a 256 antenna, 16 UE scenario using 16-QAM and a non-line of sight (LoS) QuaDRiGa millimeter-wave (mmWave) channel model under various equalization approaches.



FIG. 7C is a graphical representation of uncoded BER for a 256 antenna, 16 UE scenario using 16-QAM and an LoS QuaDRiGa mmWave channel model under various equalization approaches.



FIG. 8A is a graphical representation of power as a function of equalizer resolution for a finite-alphabet equalizer hardware designed in 28 nanometer (nm) complementary metal-oxide-semiconductor (CMOS) technology for a 256 antenna, 16 UE massive MU-MIMO embodiment of the wireless communication system of FIG. 2.



FIG. 8B is a graphical representation of area as a function of equalizer resolution for the finite-alphabet equalizer hardware of FIG. 8A.



FIG. 9A is a graphical representation of ¾ coded BER for soft-output finite-alphabet equalization using an i.i.d. Rayleigh-fading channel model under various equalization approaches.



FIG. 9B is a graphical representation of ¾ coded BER for soft-output finite-alphabet equalization using a non-LoS QuaDRiGa mmWave channel model under various equalization approaches.



FIG. 9C is a graphical representation of ¾ coded BER for soft-output finite-alphabet equalization using an LoS QuaDRiGa mmWave channel model under various equalization approaches.



FIG. 10A is a graphical representation of uncoded BER for a finite-alphabet precoder assuming perfect knowledge of a UE-side precoding factor β.



FIG. 10B is a graphical representation of uncoded BER for the finite-alphabet precoder with estimated UE-side precoding factor/using one pilot.



FIG. 11A is a graphical representation of EVM for the different finite-alphabet Wiener filter precoding (FAWP) precoders of FIG. 10B and a 1-bit alphabet.



FIG. 11B is a graphical representation of EVM for the different FAWP precoders of FIG. 10B and a 2-bit alphabet.



FIG. 11C is a graphical representation of EVM for the different FAWP precoders of FIG. 10B and a 3-bit alphabet.



FIG. 12A is a graphical representation of uncoded BER using the non-LoS QuaDRiGa mmWave channel model for the different FAWP precoders of FIGS. 11A-11C.



FIG. 12B is a graphical representation of uncoded BER using the LoS QuaDRiGa mmWave channel model for the different FAWP precoders of FIGS. 11A-11C.



FIG. 13 is a block diagram of the radio access node of FIG. 2 suitable for implementing digital beamforming of signals according to embodiments disclosed herein.





DETAILED DESCRIPTION

The embodiments set forth below represent the necessary information to enable those skilled in the art to practice the embodiments and illustrate the best mode of practicing the embodiments. Upon reading the following description in light of the accompanying drawing figures, those skilled in the art will understand the concepts of the disclosure and will recognize applications of these concepts not particularly addressed herein. It should be understood that these concepts and applications fall within the scope of the disclosure and the accompanying claims.


It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.


It will be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected” or “directly coupled” to another element, there are no intervening elements present.


The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including” when used herein specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.


Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.


Finite-alphabet beamforming for multi-antenna wideband systems is provided. The combination of massive multi-user multiple-input multiple-output (MU-MIMO) technology and millimeter-wave (mmWave) communication enables unprecedentedly high data rates for radio frequency (RF) communications. In such systems, beamforming must be performed at extremely high rates over hundreds of antennas. For example, spatial equalization applies beamforming in the uplink to mitigate interference among user equipment (UEs) at a base station (BS). Conventional hardware designs of spatial equalizers in all-digital BSs, where each antenna is equipped with a pair of data converters, would entail prohibitively high power consumption and implementation costs. To address these issues, finite-alphabet equalization provides a new paradigm that restricts the entries of a spatial equalization matrix to low-resolution numbers, enabling high-throughput, low-power, and low-cost equalization hardware.


Similarly, precoding applies beamforming in the downlink to maximize the reception of a signal transmitted from a BS to a target UE. Finite-alphabet precoding can be applied in the downlink to similarly improve power and cost in precoding hardware.


To minimize the performance loss of finite-alphabet equalization, embodiments generate an equalization matrix using finite-alphabet minimum mean-square error (MMSE) equalization (FAME), which significantly outperforms a nacustom characterve quantization of a linear MMSE matrix. A similar approach is applied to replace part of linear Wiener filter (WF) precoding matrices with a finite-alphabet WF precoding (FAWP) matrix for the downlink. Efficient algorithms can be deployed to quantize solutions to a non-deterministic polynomial-time (NP)-hard FAME/FAWP problem defined herein. Through this, it is shown that for massive MU-MIMO mmWave systems, near-optimal error-rate performance can be achieved with equalization coefficients quantized to only 1-3 bits. In addition, very-large scale integration (VLSI) results demonstrate a reduction in equalization power and area by at least a factor of 3.9× and 5.8×, respectively, over traditional approaches.



FIG. 2 is a schematic diagram of a wireless communication system 18 with a radio access node 20 having finite-alphabet beamforming circuitry. The finite-alphabet beamforming circuitry can include a finite-alphabet equalizer 22 and/or a finite-alphabet precoder 24. The radio access node 20 can be any node in a radio access network (RAN) of the wireless communications system 18 (e.g., a cellular communications network) which can transmit and/or receive RF signals. In some examples, the radio access node 20 is a BS (e.g., a New Radio (NR) BS (gNB) in a Third Generation Partnership Project (3GPP) Fifth Generation (5G) NR network), a low-power BS (e.g., a micro BS or a pico BS), or a relay node. In some examples, the radio access node 20 is a network node that implements part of the functionality of a BS (e.g., a network node that implements a gNB central unit (gNB-CU) or a network node that implements a gNB distributed unit (gNB-DU)) or a network node that implements part of the functionality of some other type of radio access node.


In an exemplary aspect, the radio access node 20 includes an antenna array 26 (e.g., having B antennas), which can be operated for MU-MIMO communication with multiple UEs 28 (e.g., U UEs 28). Accordingly, the radio access node 20 is configured to perform spatial equalization in uplink communications (e.g., communications from the UEs 28 to a core network 30 of the wireless communications system 18). The purpose of spatial equalization is to beamform received signals by collecting the signals from all U UEs 28 at B antennas of the antenna array 26, while suppressing inter-UE interference. After this beamforming, these signals may be further processed via baseband processing circuitry 32 and forwarded to the core network 30. In some embodiments, the radio access node 20 is further configured to perform precoding in downlink communications (e.g., from the core network 30 to the UEs 28). Precoding similarly beamforms signals to be transmitted to the UEs 28 to maximize the reception of a transmitted signal at its target UE 28.


In an exemplary aspect, the antenna array 26 of the radio access node 20 includes a large number of B antennas (e.g., at least B=64 antennas). As described above, for such a large antenna array 26, a conventional matrix-vector-product circuit consumes a large amount of power. Consequently, the finite-alphabet beamforming circuitry (e.g., the finite-alphabet equalizer 22 and/or the finite-alphabet precoder 24) provides more efficient spatial equalization in order to minimize power consumption and semiconductor area (which translate to system costs), while achieving high spectral efficiency.


The finite-alphabet equalizer 22 and the finite-alphabet precoder 24 beamform digital baseband signals based on a wireless channel estimation provided by channel estimator circuitry 34 using matrix-vector products. The matrix-vector products required for spatial equalization and precoding involve multiplications and additions, where the hardware multipliers dominate power and area. The area and delay of a hardware multiplier scales with O(mn) and O(log(max{m, n})), respectively, where m and n are the number of bits of each operand. Therefore, circuit area, delay, and power consumption (which is roughly proportional to circuit area) of a matrix-vector-product engine can be minimized by using a low number of bits to represent both operands.


In this regard, Section I below focuses on application of finite-alphabet equalization to uplink communications using the finite-alphabet equalizer 22. Section II adapts the finite-alphabet equalization approach to downlink communications using the finite-alphabet precoder 24.


Notation

Matrices and column vectors are represented with uppercase and lowercase boldface letters, respectively. The Hermitian transpose and the Frobenius norm of a matrix A are denoted by AH and ∥A∥F, respectively. The real part of a complex-valued matrix A is custom character{A} and the imaginary part is ℑ{A}. The M×M identity matrix is denoted by IM. The kth entry and the custom character2-norm of a vector a are αk and ∥a∥2, respectively; the entry-wise complex conjugate is denoted by a*. The kth standard basis vector is represented by ek. The signum function sgn(⋅) is defined as sgn(a)=+1 for a≥0 and sgn(a)=−1 for a<0 and is applied entry-wise to vectors. custom characterx[⋅] is used to denote expectation with respect to the random vector x. The set custom character+ contains all positive semidefinite matrices, and the set custom character+ contains all the non-negative real numbers.


I. Finite-Alphabet Equalizer (Uplink)

With continuing reference to FIG. 2, traditional approaches have extensively explored the efficacy of low-resolution data converters 36 (e.g., analog-to-digital converters (ADC) and/or digital-to-analog converters (DAC)) at the antenna array 26 of massive MU-MIMO systems. Depending on the scenario, 3 to 8 bits have been shown to achieve near-optimal spectral efficiency. Such methods reduce the precision of one of the operands (i.e., that of the received vector for uplink communications) in a matrix-vector product. However, the coefficients of the equalization matrix (the other operand) are typically left at relatively high precision (e.g., 10 to 12 bits).


To reduce power consumption and implementation costs of spatial equalization, the finite-alphabet equalizer 22 coarsely quantizes the coefficients of one or more spatial equalization matrices. In contrast to approaches that use low-resolution ADCs to quantize the received vector to be equalized, finite-alphabet equalization coarsely quantizes the entries of the spatial equalization matrix. While this approach appears straightforward, obtaining low-resolution finite-alphabet equalization matrices that achieve high spectral efficiency is a hard problem.



FIG. 3 is a graphical representation of uncoded bit error-rate (BER) for spatial equalization approaches, including the finite-alphabet equalizer 22 of FIG. 2. An unquantized linear MMSE (L-MMSE) is compared with a nacustom characterve quantization of a 1-bit per real and imaginary part finite-alphabet L-MMSE (FL-MMSE), which suffers from a high error floor. To combat this problem, the finite-alphabet equalizer 22 applies finite-alphabet MMSE equalization (FAME), which leads to an NP-hard optimization problem that can be solved approximately (and efficiently) using forward-backward splitting (FBS). This method is also referred to herein as FAME-FBS. As shown in FIG. 3, using FAME-FBS results in a substantially improved error rate compared to FL-MMSE equalization.


The finite-alphabet equalizer 22 uses a specific finite-alphabet equalization-matrix structure that enables one to reduce the complexity of a U×B matrix-vector product by using U×B low-resolution coefficients, while still being able to deliver a performance similar to conventional, high-resolution spatial equalization matrices, as further described in subsection A below. The so-called FAME problem is described in subsection B below, whose solution leads to finite-alphabet equalization matrices that minimize the post-equalization mean-square error (MSE). In subsection C, a range of algorithms are presented that approximate the NP-hard FAME problem some of these algorithms achieve excellent performance even for 1-bit resolution; some require very low complexity.


In subsection D, the finite-alphabet equalization approach is evaluated for line-of-sight (LoS) and non-LoS mmWave channel models, which demonstrate the efficacy of FAME in terms of error-vector magnitude (EVM), beamforming capabilities, and uncoded BER. In addition, finite-alphabet equalization circuits are implemented for different numbers of bits in a 28 nanometer (nm) complementary metal-oxide-semiconductor (CMOS) to demonstrate the effectiveness of FAME in practice.


A. System Model

With continuing reference to FIG. 2, the U UEs 28 transmit data in the same time-frequency resource to the B antennas in the antenna array 26 of the radio access node 20. After estimating the channel with the channel estimator circuitry 34, the all-digital radio access node 20 applies spatial equalization to collect the signals from the individual UEs 28 and suppress inter-UE interference.


Focusing on the uplink of the wireless communications system 18 (e.g., a massive MU-MIMO system), the following narrowband input-output relation can be defined:

y=Hs+n  Equation 1

Here, yϵcustom characterB is the received signal vector at the radio access node 20, Hϵcustom characterB×U is the known uplink MIMO channel matrix, sϵcustom characterU is the transmit data vector, where custom character is the constellation set (e.g., 16-quadrature amplitude modulation (QAM)), and nϵcustom characterB is independent and identically distributed (i.i.d.) circularly-symmetric complex Gaussian noise with covariance matrix Cn=custom charactern[nnH]=N0IB per complex entry. In what follows, it is assumed that the transmit signals of the UEs 28, su, u=1, . . . , U, are i.i.d. zero mean with variance Es so that Cs=custom characters[ssH]=EsIU.


It should be understood that the input-output relation in Equation 1 is not only valid to model narrowband transmission, but can also be used to model the subcarriers of a wideband massive MU-MIMO system that uses orthogonal frequency-division multiplexing (OFDM) or single-carrier frequency-division multiple access (SC-FDMA). The theory and algorithms developed in the remainder of this disclosure can be generalized for systems with inter-symbol interference.


For this model, it is assumed that the channel remains constant over multiple symbol transmissions and, hence, can be estimated. For the mathematical derivations, it is further assumed that perfect channel state information is quantized at the radio access node 20. For systems in which the UEs 28 use an antenna array to perform transmit beamforming, the channel matrix H represents the joint effect of beamforming and the physical channel.


A key task of the radio access node 20 is to form estimates ŝϵcustom characterU of the transmitted data vector s. To develop methods that are computationally efficient and hardware friendly, embodiments focus on linear spatial estimators of the form ŝ=WHy where WHϵcustom characterU×B is the L-MMSE equalization matrix that minimizes the post-equalization MSE defined as:

MSE=custom characters,n[∥WHy−s∥22]  Equation 2

Given the assumptions on the statistics of the transmit data and noise vectors, s and n:

MSE=custom characters∥WHH−IUF2+N0∥WHF2  Equation 3


Hence, the L-MMSE equalization matrix can be obtained by solving the following matrix least-squares problem:

WH=arg custom characterIU−{tilde over (W)}HH∥F2+ρ∥{tilde over (W)}HF2   Equation 4

with regularization parameter ρ=N0/Es. This optimization problem has a closed-form solution given by:

WH=(ρIU+HHH)−1HH  Equation 5

which can be computed efficiently in hardware.


Alternatively, the rows wuH, u=1, . . . , U, of the L-MMSE equalization matrix WH can be computed by solving:

wu=arg custom charactereu−HH{tilde over (w)}∥22+ρ∥{tilde over (w)}∥22  Equation 6

This alternative formulation of the L-MMSE equalizer is useful in solving the FAME problem, as detailed in the next subsection.


B. FAME: Finite-Alphabet MMSE Equalization

Embodiments of the finite-alphabet equalizer 22 of FIG. 2 operate under the finite-alphabet equalization paradigm described in this subsection. A finite-alphabet equalization matrix is defined that enables efficient hardware for low-cardinality alphabets. Then, the FAME problem is formulated, which computes the finite-alphabet equalization matrix that minimizes the post-equalization MSE. Finally, a simple approach is presented to compute finite-alphabet equalization matrices and compare performance of the finite-alphabet equalizer 22 to an equalizer that solves the FAME problem exactly.


Linear equalization in hardware requires the computation of an inner product ŝu=custom characterwu, ycustom character=wuHy per UE for every received vector y. As described above, executing even such simple computations at the bandwidth offered by mmWave systems can result in excessively large area and high power consumption. To reduce both the area and power consumption, embodiments of the finite-alphabet equalizer 22 reduce the numerical precision of the equalization vectors wu, u=1, . . . , U. In the extreme case where the entries of wu are quantized using 1-bit per real and imaginary component, an inner-product computation would only require additions and subtractions; this is significantly less costly (in area and power) than using high-precision multipliers.


However, it is obvious that reducing the numerical precision of the equalization vectors wu will affect the MSE and eventually the error-rate performance. Furthermore, quantization to, e.g., the finite alphabet custom character={+1+j, +1−j, −1+j, −1−j}, will result in numerical-range issues, meaning that such matrices will not be able to represent large or small entries. To mitigate both of these issues, a principled way to perform equalization is developed with finite-alphabet matrices.


A U×B finite-alphabet equalization matrix is defined as follows:

VH=diag(β′)XH  Equation 7

Here, βϵcustom characterU is a vector that consists of post-equalization scaling factors and XHϵcustom characterU×B is an equalization matrix with entries chosen from the finite alphabet custom character.


Embodiments of the finite-alphabet equalizer 22 use finite alphabets custom character of low cardinality and whose elements can be represented using a small number of bits (e.g., 8 bits or less). An example is the 1-bit finite alphabet custom character={+1+j, +1−j, −1+j, −1−j}, which uses 1-bit per real and imaginary component.


With Equation 7, the equalized received symbol for the uth UE 28 is given by:

ŝu=vuHy=β*uxuHy  Equation 8

where vuHϵcustom character1×B and xuHϵcustom character1×B are the uth rows of the matrices VH and XH, respectively. The spatial equalization as in Equation 8 allows for efficient circuit implementations, especially for finite alphabets with low cardinality and regularly spaced elements. For such matrices, the inner product xuHy can be implemented using low-resolution multipliers. As βϵcustom characterU, the post-equalization scaling operation by the scalar factor βu is performed using high-resolution multipliers. Nonetheless, this operation is executed only once per UE 28. Section I-D below illustrates that equalizer implementations which leverage finite-alphabet equalization matrices enable significant area and power savings.


From the above equations, FAME is developed as a principled method to compute MSE-optimal finite-alphabet equalization matrices. Analogous to the derivation of the L-MMSE equalizer in Equation 6, FAME is interested in minimizing the post-equalization MSE:

FA−MSE=custom characters,n[∥VHy−s∥22]  Equation 9

with the difference that VH=diag(β*)XH is now a finite-alphabet equalization matrix as per Equation 7. From Equation 6, it follows that the rows vuH=β*uxuH, u=1, . . . , U, of such a FAME matrix can be computed by solving the following optimization problem:

u,xu}=arg custom charactereu−HH{tilde over (β)}{tilde over (x)}∥22+ρ∥{tilde over (β)}{tilde over (x)}∥22  Equation 10


Intuitively, embodiments seek to find the finite-alphabet equalization vectors vuH=β*uxuH, u=1, . . . , U, that best mimic the infinite-precision L-MMSE equalizer.


For a fixed scaling factor βu, the FAME problem in Equation 10 is a closest vector problem, which is known to be NP-hard. For example, for a system with an antenna array 26 of B=256 antennas using a 1-bit finite-alphabet equalization matrix, solving the FAME problem using an exhaustive search would require one to evaluate the objective function in Equation 10 more than 10154 times for each UE 28. Clearly, without low-complexity algorithms, the FAME problem cannot be solved in practical massive MU-MIMO mmWave systems.


Since the FAME problem in Equation 10 minimizes the cost function for two quantities at once, i.e., the scaling factor βu and the low-resolution vector xu, it is not obvious how to solve it efficiently. To derive computationally efficient algorithms in subsection C below, the following equivalent form of the FAME problem is used. The FAME problem in Equation 10 is equivalent to solving the following optimization problem for each UE u=1, . . . , U:










x
u

=



arg

min



x
~



𝒳
B











H
H



x
~




2
2

+

ρ





x
~



2
2








h
u
H



x
~




2







Equation





11








where the associated optimal scaling factor is given by:











β
u



(

x
u

)


=



x
u
H



h
u








H
H



x
u




2
2

+

ρ





x
u



2
2








Equation





12







This formulation of the FAME problem facilitates first finding the optimal vector xu using Equation 11 and then computing the associated optimal scaling factor βu using Equation 12. Note that Equation 12 models the MSE optimal scaling factor βu for a given vector xu, regardless of how xu was computed.


FL-MMSE: A Baseline Finite-Alphabet Equalizer: Since the FAME problem is NP-hard, a baseline method is presented to compute finite-alphabet equalization matrices as in Equation 7 without having to solve the FAME problem in Equation 11. This approach is referred to as FL-MMSE, as it obtains the entries of the low-resolution matrix XH by quantizing the L-MMSE equalizer in Equation 5. The corresponding scaling factors βu are then obtained using Equation 12. Throughout this disclosure, the FL-MMSE equalizer is used as a baseline method to evaluate the performance of embodiments of the finite-alphabet equalizer 22 that attempt to directly solve the FAME problem in Equation 11.


For the 1-bit case, FL-MMSE applies the signum function sgn(⋅) separately on the real and imaginary parts of the L-MMSE matrix WH to obtain XH. Then, FL-MMSE uses the optimal FAME scaling in Equation 12 to compute the high-resolution scaling factors in the vector β. FL-MMSE can also be used with finite-alphabets that have more than 1-bit per complex entry. In such cases, after computing the L-MMSE equalization matrix WH in Equation 5, the real and imaginary parts are quantized as follows.


For each row wuH of WH, the scalar wmax corresponding to the largest absolute value in [custom character{wuH}, ℑ{wuH}] is identified. Then, assuming that the targeted resolution is r bits, the range [−wmax, +wmax] is divided into 2r uniform-width bins and the entries of custom character{wuH} and ℑ{wuH} are quantized to the centroid values of these bins. For 2-bit resolution, for example, the centroid values of the bins are {−0.75, −0.25, +0.25, +0.75}wmax. In hardware, one would scale these centroid values so that the minimum absolute value corresponds to 1. Following the previous example, one would use the values {−3, −1, +1, +3} to represent the entries of custom character{xuH} and ℑ{xuH}. Note that this scaling does not affect the solution of the FAME problem in Equation 11, as it can be absorbed into the scaling factor vector βu in Equation 12. After obtaining the low resolution vector, the corresponding scaling factor βu is computed using Equation 12.



FIGS. 4A-4C and 5A-5C illustrate EVM and beamforming performance of the optimal solution to the FAME problem in Equation 11 for a 1-bit finite alphabet. To solve the NP-hard problem, these examples use an exhaustive search, referred to as FAME-EXH, for the finite-alphabet equalizer 22 of FIG. 2. To keep the complexity within reasonable bounds, a small MU-MIMO system with a B=8 antenna array 26 is simulated. For comparison, the performance of conventional, infinite-precision L-MMSE equalization and 1-bit FL-MMSE equalization are simulated.



FIG. 4A is a graphical representation of EVM performance for the L-MMSE equalizer. FIG. 4B is a graphical representation of EVM performance for the nave FL-MMSE equalizer. FIG. 4C is a graphical representation of EVM performance for a FAME-EXH embodiment of the finite-alphabet equalizer 22 of FIG. 2. FIGS. 4A-4C show scatter plots of the equalization outputs § for 2,000 realizations for these embodiments of the wireless communications system 18 operating over an i.i.d. Rayleigh-fading channel at 15 decibels (dB) SNR.


While the infinite-precision L-MMSE equalizer achieves an EVM of 11.58%, quantizing its solution to 1-bit using FL-MMSE degrades the EVM to 30.58%, which blurs the decision regions of the considered 16-QAM constellation. In stark contrast, the 1-bit FAME-EXH equalizer achieves an EVM of only 15.30%, which is close to that of the infinite-precision L-MMSE equalizer; furthermore, the decision regions between constellation points are clearly visible. These results demonstrate the significant EVM advantage of solving the FAME problem over the simple FL-MMSE equalizer.



FIG. 5A is a graphical representation of beam- and null-forming capabilities for the L-MMSE equalizer. FIG. 5B is a graphical representation of beam- and null-forming capabilities for the FL-MMSE equalizer. FIG. 5C is a graphical representation of beam- and null-forming capabilities for the FAME-EXH embodiment of the finite-alphabet equalizer 22 of FIG. 2. For FIGS. 5A-5C, these embodiments of the wireless communications system 18 operating at 15 dB SNR over a textbook LoS channel were modeled, where the channel coefficient between the bth antenna of the antenna array 26 and a UE 28 located at an angle of ϕ is modeled as follows:

hb(ϕ)=e−jπ(b−1)cos(ϕ),b=1, . . . ,B  Equation 13

Here, a uniform linear array (ULA) of antennas with half-wavelength antenna spacing and constant path loss is assumed. A primary UE 28 is located at an angle of ϕ1=60° and a secondary UE 28 is located at ϕ2=120°. Next, the corresponding equalization matrix is computed using L-MMSE, FL-MMSE, and FAME-EXH equalization. An evaluation is performed of how much the equalization vector v1H (which corresponds to the UE at ϕ1=60°) captures (or rejects) signals incoming from different incident angles by evaluating |v1Hh(ϕ′)|2 for 0≤ϕ′≤π. The equalization vector v1H should amplify the signal from the primary UE 28 at ϕ1=60° but attenuate the signal from the secondary UE 28.


The results shown in FIGS. 5A-5C demonstrate that the infinite-precision L-MMSE equalization is able to simultaneously beam-form towards the primary UE 28 and null interference from the secondary UE 28. The 1-bit FL-MMSE equalizer is unable to reject interference from the secondary UE 28. In stark contrast, the 1-bit FAME-EXH equalization is able to both beamform towards the primary UE 28 and null-form towards the secondary UE 28.


Despite the significant performance advantages of 1-bit FAME-EXH over 1-bit FL-MMSE, solving 1-bit FAME-EXH for large-dimensional problems that arise in mmWave systems is infeasible in practice. To this end, low-complexity FAME solvers are developed that scale to large antenna arrays 26.


C. Fast Algorithms to Solve FAME

Approximate algorithms are presented to solve the FAME problem efficiently for a radio access node 20 with a large antenna array 26. This begins with proposing a semidefinite relaxation (SDR)-based method and then developing a much faster method that uses forward-backward splitting (FBS).


FAME with SDR: First, SDR is used to solve the FAME problem in Equation 11 for a 1-bit finite alphabet. To do so, the FAME problem is re-expressed in the real domain using the quantities:










x


=



[







{
x
}







𝔍


{
x
}





]






and






H



=

[







{
x
}






-
𝔍



{
x
}







𝔍


{
x
}








{
x
}





]






Equation





14








Throughout, it is assumed that custom character{x} and ℑ{x} take values from the same alphabet custom character. For example, for 1-bit finite alphabets, custom character={−1, +1}. The FAME problem in Equation 11 can now be rewritten as:










x


,
u


=



arg

min




x
~





𝒳


2

B












H

H




x
~






2
2

+

ρ






x
~





2
2








h


,
u

H




x
~






2







Equation





15







It is now key to realize that the vector custom charactercan be scaled arbitrarily without changing the objective function of Equation 15. This observation enables us to state an equivalent optimization problem:









{






minimize




x
~





𝒵
α

2

B



,

α
>
0









H

H




x
˜






2
2


+

ρ






x
˜





2
2









subject











to










h


,
u

H




x
˜






2


=
1








Equation





16








where the discrete set custom characterα is a scaled version of custom character; for 1-bit finite alphabets, custom characterα={−α, +α} with α>0. This formulation enables formulation of a semidefinite program to solve the FAME problem approximately.


By focusing on 1-bit finite alphabets, Equation 16 can be relaxed by replacing the constraint custom character|2=1 by custom character=1, where the positive semidefinite matrix Xϵcustom character+2B should approximate custom character. This SDR yields









{





minimize


X
_



𝕊
+

2





B






tr


(


(



H




H

H


+

ρ






I

2





B




)



X
_


)








subject





to











h


,
u

H



X
¯



h


,
u



=
1









X
¯


1
,
1


=


X
¯


b
,
b



,

b
=
2

,





,

2

B













Equation





17








where the diagonal elements of Xϵcustom character+2B are made equal (without specifying their value). This constraint is a result of the fact that we are interested in a solution in the set custom characterα, where the parameter a is not known. After solving the semidefinite program in Equation 17, the finite-alphabet vector is computed by first extracting the leading eigenvector of the solution matrix X followed by quantizing it to {−1, +1} using the signum function sgn(⋅). The equalization vector can then be scaled using the optimal FAME scaling parameter in Equation 12. This procedure is referred to herein as FAME-SDR.


While FAME-SDR can also be derived for multi-bit finite alphabets, this approach is not pursued for the following reasons. As described further below, the complexity of FAME-SDR does not scale well to a large number of BS antennas. Moreover, FAME-SDR cannot be applied to finite alphabets that are not separable into real and imaginary parts, such as a finite alphabet that contains the elements of an 8-phase shift keying (PSK) constellation. In addition, SDR can only handle finite alphabets with even cardinality that exclude a zero element. To avoid the drawbacks of SDR for FAME, an alternative approach is presented.


FAME with FBS: Due to the high complexity of FAME-SDR and the fact that SDR solvers are notoriously difficult to implement in hardware, a low-complexity alternative for solving the FAME problem approximately is presented next. To do so, it is assumed that, for each UE u=1, . . . , U, the optimal value of the objective in Equation 11 is known and denoted by γu. Mathematically:











γ
u

=







H
H



x
u




2
2

+

ρ





x
u



2
2








h
u
H



x
u




2



,




Equation





18








where xu is the solution to the problem in Equation 11. Note that it follows from Equation 18 that γu>1. Rearranging 18 yields:

0=∥HHxu22+ρ∥xu22−γu|huHxu|2  Equation 19

Thus, if γu was known, solving the problem:










x
u

=




arg

min



x
~



𝒳
B





1
2







H
H



x
˜




2
2


+


ρ
2






x
˜



2
2


-




γ





u

2







h
u
H



x
˜




2







Equation





20








would yield the same solution as Equation 11. As the value of γu is unknown in practice, it is used as an algorithm parameter that is tuned to empirically improve the error-rate performance.


Since the problem in Equation 20 still contains a search over the finite-alphabet custom characterB, the non-convex constraint {tilde over (x)}ϵcustom characterB is relaxed to {tilde over (x)}ϵcustom characterB. Here, custom character corresponds to the convex hull of the finite alphabet custom character, which is defined as:











=

{






i
=
1



𝒳






α
i




x
¯

i






(



α
i




+


,


i


)






i
=
1



X





α
i




=
1

}


,




Equation





21








where xi is the ith element of custom character and i=1, . . . , |custom character|. After this relaxation step, the all-zeros vector 0B×1 becomes a trivial solution. To prevent the algorithm from returning this trivial solution, a term in Equation 20 is included that encourages large entries in the vector x. Specifically,







-

δ
2







x
˜



2
2






is added to the objective function, where δ>0 is a regularization parameter. The resulting optimization problem is given by:










x
u

=




arg

min



x
~




B





1
2







H
H



x
˜




2
2


-




γ





u

2







h
u
H



x
˜




2


+



ρ
-
δ

2






x
˜



2
2







Equation





22







FBS is used to compute a solution to Equation 22. FBS is an efficient, iterative solver for convex optimization problems of the form:

{circumflex over (x)}=arg minxƒ({tilde over (x)})+g({tilde over (x)})  Equation 23

where both functions ƒ and g are convex, but ƒ is a smooth function and g is not necessarily smooth or bounded. FBS executes the following operations for t=1, 2, . . . , tmax iterations or until convergence:

{tilde over (z)}(t+1)={tilde over (x)}(t)−Σ(t)∀ƒ({tilde over (x)}(t))  Equation 24
{tilde over (x)}(t+1)=proxg({tilde over (z)}(t+1)(t))  Equation 25

Here, ∀ƒ({tilde over (x)}(t)) is the gradient of the function ƒ, {τ(t)>0} is a sequence of step sizes, and proxg(⋅) is the proximal operator of the function g, defined as:











prox
g



(


z
˜

;
τ

)


=



arg

min


x
˜




{


τ


g


(

x
˜

)



+


1
2







x
˜

-

z
˜




2
2



}






Equation





26







The problem in Equation 22 is not convex and hence, FBS is not guaranteed to converge to an optimal solution. Nevertheless, FBS can be used to find approximate solutions to Equation 22 by setting:










f


(

x
˜

)


=



1
2







H
H



x
˜




2
2


-




γ





u

2







h
u
H



x
˜




2







Equation





27







g


(

x
˜

)


=



1


B




(

x
˜

)


+



ρ
-
δ

2






x
˜



2
2







Equation





28








Here, the convex constraint {tilde over (x)}ϵcustom characterB in Equation 22 is incorporated into the function g({tilde over (x)}) via the indicator function custom character({tilde over (x)}), which is zero if {tilde over (x)}ϵcustom characterB and infinity otherwise. With these definitions:

∀ƒ({tilde over (x)})=HHH{tilde over (x)}−γuhuhuH{tilde over (x)}  Equation 29
proxg({tilde over (z)})=sgn(custom character{{tilde over (z)}})min{v(t)|custom character{{tilde over (z)}}|,1}+j sgn(ℑ{{tilde over (z)}})min{ν(t)|ℑ{{tilde over (z)}}|,1}   Equation 30

where ν(t)=(1+τ(t)(ρ−δ))−1 and Equation 30 is applied element-wise to the vector {tilde over (z)}.


Note that three sets of algorithm parameters have been introduced: {τ(t)}, {ν(t)}, and {γu}, where t=1, . . . , tmax and u=1, . . . , U. In some examples, these parameters may be tuned manually. To avoid manual tuning of these parameters, other embodiments apply a neural-network-based approach. As the same algorithm parameters should work across several channel realizations, having a per-UE parameter such as {γu} is meaningless. As a result, embodiments set γ=γu for u=1, . . . , U. Furthermore, to provide the neural network with greater flexibility during optimization, γ is allowed to be different in each iteration; i.e., another set of per-iteration parameters {γ(t)}, t=1, . . . , tdmax are introduced. The resulting algorithm is referred to as FAME-FBS, which is summarized as follows:
















Algorithm 1 (FAME-FBS). Initialize {tilde over (x)}(1) with either the maximum



ratio combining (MRC) solution hu or the low-resolution vector



xu computed by FL-MMSE, and fix the sets of parameters {τ(t)},



(t)}, and {γ(t)}. Then, for every iteration t = 1, . . . , tmax, compute:










 {tilde over (z)}(t+1) = (IB − τ(t)H(IU − γ(t)eueHu)HH){tilde over (x)}(t)
Equation 31



 {tilde over (x)}(t+1) = proxg({tilde over (z)}(t+1))
Equation 32









Here, the proximal operator proxg(·) is the element-wise function



given by Equation 30. The result {tilde over (x)}(tmax+1))is quantized to the finite



alphabet custom character  to obtain xHu. Then, the optimal FAME scaling parameter



β3 is computed using Equation 12.









FAME-FBS supports multi-bit finite alphabet equalization matrices. This is achieved by uniformly quantizing, in the range [−1, +1], the real and imaginary parts of the solution vector {tilde over (x)}(tmax+1), similar to what is done by FL-MMSE equalization. As a consequence, unlike FAME-SDR, FAME-FBS can operate with finite alphabets that contain (i) an odd number of elements and (ii) a zero element. Furthermore, FAME-FBS (and FL-MMSE) can be applied to PSK-like finite alphabets.


Computational Complexity: The complexity is assessed of (i) computing the equalization matrix and (ii) performing equalization on a received vector y, for high-resolution and finite-alphabet equalization approaches. Computational complexity is measured as the number of real-valued multiplications performed by an algorithm.


Table I lists the computational complexity for computing a single equalization matrix using L-MMSE, FL-MMSE, FAME-SDR, and FAME-FBS. For the infinite-precision L-MMSE equalizer, the complexity corresponds to explicitly computing the equalization matrix WH. For the finite-alphabet equalizers (FL-MMSE and FAME-based algorithms), the complexity corresponds to the computation of the low-resolution matrix XH and the scaling factors in the vector p. Solving FAME-SDR results in the highest complexity, which asymptotically scales as O(B4.5) unless specific problem structures can be exploited. Evidently, FAME-SDR does not scale well to systems with a large antenna array 26. FAME-FBS has the same asymptotic scaling of O(BU2) as L-MMSE and FL-MMSE equalization, making it suitable for massive MU-MIMO mmWave systems.









TABLE I







Complexity for computing an equalization matrix









Algorithm
Computational complexity
Asymptotic scaling





L-MMSE
2U3 + 6BU2 − 2BU − 2U + 1
O(BU2)


FL-MMSE
10BU2 + 2U3 + 2U2 + U + 1
O(BU2)


FAME-SDR
n.a.
O(BU4.5)


FAME-FBS
(8tmax + 4)BU2 + 2U2 +
O(BU2)



(4tmax + 2)U + (2tmax + 3)U









While the constant associated with the term BU2 is larger for FAME-FBS than for L-MMSE and FL-MMSE, the complexity of the latter algorithms appears to be higher in practice. Computing the L-MMSE (and the FL-MMSE) equalizer in hardware requires square roots and divisions, which result in high numerical precision requirements. Furthermore, the Cholesky decomposition and forward- and back-substitution procedures required when computing the L-MMSE (and the FL-MMSE) equalization matrix result in stringent data dependencies that limit parallelism and, hence, reduce throughput. In contrast, FAME-FBS has a regular structure with few data dependencies and the matrix-vector multiplications can be parallelized easily. In addition, one can parallelize computation per UE as the FAME problem in Equation 11 is independent for u=1, . . . , U. In fact, a simple hardware engine could be used to efficiently execute FAME-FBS to determine the low-resolution equalization vectors xuH.


After computing the equalization matrix, one must perform spatial equalization on the received signal vectors y at the rate of the ADCs. For the infinite-precision L-MMSE equalizer, this corresponds to computing one high-resolution matrix-vector product {circumflex over (x)}=WHy per receive vector. For finite-alphabet equalizers, this corresponds to a low-resolution matrix-vector product z=XHy, followed by U high-resolution products ŝu=β*uzu, u=1, . . . , U. The complexity of equalization is summarized in Table II, which distinguishes between high resolution and low resolution multiplications. While finite-alphabet equalization performs more multiplications than a conventional equalizer, most of these multiplications are performed at low resolution. Thus, for sufficiently low resolution, finite-alphabet equalization effectively reduces the complexity of spatial equalization.









TABLE II







Complexity of finite-alphabet equalization









Real-valued multiplication count









Equalization
High resolution
Low resolution





Traditional
4BU
0


Finite Alphabet
4U
4BU









While spatial equalization must be carried out at symbol rate, the computation of the equalization matrix must only be carried out if the channel matrix changes. Due to operation at extremely high bandwidths, the complexity of performing equalization will dominate in most mmWave systems. For scenarios with short coherence times, methods that minimize the complexity of computing the equalization matrix are to be preferred.


D. Evaluation

With reference to FIGS. 6A-6B and 7A-7B, the uncoded BER performance of FAME-based algorithms are evaluated and compared to infinite-precision L-MMSE equalization and FL-MMSE. The following simulation results are obtained by carrying out 104 Monte-Carlo trials. The per-iteration parameters {γ(t)}, {τ(t)}, and {ν(t)} of FAME-FBS are tuned using a neural network; the neural network is trained using 104 channel realizations, which differ from the ones used to evaluate the BER. For all equalizers, the entries of the channel matrices H are quantized to 8 bits per real and imaginary components. In addition, the received signal vectors are quantized to 7 bits per real and imaginary components, which is sufficient to achieve virtually the same performance as with double-precision representation.



FIG. 6A is a graphical representation of uncoded BER for an 8 antenna, 2 UE scenario using quadrature phase-shift keying (QPSK) in an i.i.d. Rayleigh-fading scenario under various equalization approaches. For this example, 1-bit FAME-EXH significantly outperforms 1-bit FL-MMSE, which suffers from an error floor. Since the complexity of FAME-EXH scales exponentially in B, it cannot be used for significantly larger systems. Hence the performance of FAME-SDR is shown, with a 2 dB loss at a BER of 10−3 compared to FAME-EXH.



FIG. 6B is a graphical representation of uncoded BER for a 64 antenna, 4 UE scenario using 16-QAM in an i.i.d. Rayleigh-fading scenario under various equalization approaches. Since FAME-SDR scales to systems with more BS antennas, this example shows that FAME-SDR continues to substantially outperform 1-bit FL-MMSE. As discussed above, however, FAME-SDR does not scale to systems with more BS antennas, whereas FAME-FBS exhibits the same asymptotic complexity scaling as L-MMSE and FL-MMSE equalization. FIG. 6B illustrates that FAME-FBS performs on par with FAME-SDR at much lower complexity.



FIG. 7A is a graphical representation of uncoded BER for a 256 antenna, 16 UE scenario using 16-QAM and an i.i.d. Rayleigh-fading channel model under various equalization approaches. The performance behavior of 1-bit FL-MMSE and 1-bit FAME-FBS is similar to what was observed for smaller systems. FIG. 7A also shows the performance of finite-alphabet matrices with resolutions larger than 1 bit. In this regard, the performance gap between FAME-FBS and FL-MMSE is more pronounced for 1-bit and 2-bit finite-alphabet equalization matrices than for 3-bit. Note that finite-alphabet equalizers achieve virtually the same performance as infinite-precision L-MMSE equalization when using 6 bits; nonetheless, 3 bits are sufficient to operate at SNRs lower than 4 dB.



FIG. 7B is a graphical representation of uncoded BER for a 256 antenna, 16 UE scenario using 16-QAM and a non-LoS QuaDRiGa mmWave channel model under various equalization approaches. Since i.i.d. Rayleigh-fading channels are a poor model for mmWave propagation conditions, FIG. 7B shows the performance of FAME-FBS operating over more realistic mmWave channels generated using the QuaDRiGa mmWave model (see S. Jaeckel, L. Raschkowski, K. Börner, and L. Thiele, “QuaDRiGa: A 3-D multi-cell channel model with time evolution for enabling virtual field trials,” IEEE Trans. Antennas Propag., vol. 62, no. 6, pp. 3242-3256, June 2014). In this example, mmWave systems are simulated with a carrier frequency of 60 GHz within the “mmMAGIC_UMi” scenario of QuaDRiGa.



FIG. 7C is a graphical representation of uncoded BER for a 256 antenna, 16 UE scenario using 16-QAM and an LoS QuaDRiGa mmWave channel model under various equalization approaches. This example is simulated in a similar manner as FIG. 7B, under LoS rather than non-LoS propagation conditions.


Power control is simulated for FIGS. 7B and 7C by ensuring that the received UE powers are in the range ±3 dB. Specifically, for each channel realization, the UE with highest power has 4×the power of the UE with the lowest power. Furthermore, the UEs are randomly placed in a sector of 120° in front of the BS antenna array with a distance ranging from 10 meters (m) to 110 m, and a minimum angular separation of 4°. From FIGS. 7B and 7C it is observed that FAME-FBS outperforms FL-MMSE for both non-LoS and LoS channels essentially the same trends as for Rayleigh-fading channels. These simulation results are a clear indicator that FAME still performs well under more realistic mmWave propagation conditions, while having the potential to significantly reduce power consumption and silicon area.


Hardware-Level Evaluation: To demonstrate the real-world benefits of finite-alphabet equalization, the power and area savings that can be attained in comparison with conventional, high-resolution equalizers are quantified. To arrive at a fair comparison between finite-alphabet equalization and conventional, high-resolution equalizers, two equalization circuits were implemented: one for finite-alphabet equalization and one for high-resolution equalization.


The high-resolution equalizer computes a matrix-vector product between the U×B equalization matrix WH and the received vector y. The matrix-vector product is computed in a column-by-column fashion by using a linear array of U parallel multiply-accumulate (MAC) units over B clock cycles. The multipliers in the MAC units are high-resolution and take as input 10-bit numbers from the equalization matrix WH and 7-bit numbers from the received vector y. The accumulators in the MAC units use 18 bits. Finally, 9 bits are taken from both real and imaginary accumulators as the outputs of each MAC unit. These outputs correspond to the estimates ŝ=WHy.


The finite-alphabet equalizer computes a low-resolution matrix-vector product between the U×B finite-alphabet matrix XgH and the received vector y. This matrix-vector product is implemented in the same way as in the traditional equalizer, with the difference that far fewer bits are used for the multipliers and accumulators. The multipliers take as input r-bit numbers from XH and 7-bit numbers from y, while the accumulators use r+13 bits (except for the case where r=1, where the accumulators use 13 bits). 9 bits are taken from the accumulators in each MAC unit as the output of the low-resolution matrix-vector product XHy. Unlike conventional equalization, the results of the U-dimensional vector XHy are scaled by the values in β. This scaling operation is implemented with a high-resolution multiplier that computes the product between the 9-bit xuHy and the 10-bit scaling factor βu. The output of this multiplier are the estimates ŝ=VHy, which are represented with 9 bits per real and imaginary components.


Table III lists post-layout implementation results for the circuits discussed above implemented for a B=256 BS antenna, U=16 UE system, using a 28 nm CMOS technology. The traditional, high-resolution equalizer corresponds to the design with an equalization resolution r of 10 bits, whereas the finite-alphabet equalizer was implemented for r={1, 2, . . . , 5} bits. To allow for a fair comparison between the different equalization circuits, we consider a scenario in which all of the designs support the same throughput. A throughput of 2 G (complex-valued) vectors/s is assumed, which implies that the 2B ADCs at the BS run at 2 G samples/s.









TABLE III







Implementation results in 28 nm CMOS for one equalizer instance


operating in a system with B = 256 and U = 16













Equalization








resolution r [bit]
1
2
3
4
5
10





Silicon area [mm2]
0.06
0.08
0.10
0.14
0.16
0.26


Clock freq. [GHz]
1.33
1.25
1.25
1.16
1.16
1.05


Throughput
5.18
4.88
4.88
4.53
4.53
4.10


[M vectors/s]








Power [mW]
18.5  
29.2  
38.8  
42.6  
51.3  
57.1  









As seen from Table III, a single instance of the equalizer design reaches throughputs of the order of M vectors/s, which is well below the target throughput of 2 G vectors/s. However, a time-multiplexed array of equalizers can be instantiated that achieve the desired throughput (at the expense of increased area).



FIG. 8A is a graphical representation of power as a function of equalizer resolution for a finite-alphabet equalizer 22 hardware designed in 28 nm CMOS technology for a 256 antenna, 16 UE 28 massive MU-MIMO embodiment of the wireless communication system 18 of FIG. 2. FIG. 8B is a graphical representation of area as a function of equalizer resolution for the finite-alphabet equalizer 22 hardware of FIG. 8A. Assuming no overhead for this replication approach, the total silicon area and power consumption required to perform equalization are estimated in a high-bandwidth mmWave setting.


In the example of FIGS. 8A and 8B, all equalizers operate at a rate of 2 G vectors/s and 7 bits are used to represent the entries of the received vector y. For an equalizer resolution lower than 6, a finite-alphabet equalizer consisting of a low-resolution matrix-vector product is used, followed by per-UE high-resolution scaling. The equalizer resolution of 10-bit is executed with a high-resolution matrix-vector product. Finite-alphabet equalization (shown at 1-, 2-, 3-, 4-, and 5-bit) can reduce the power and area of conventional, high-precision equalization (shown at 10-bit). For example, halving the number of bits used for the high-resolution equalizer already introduces substantial gains of 20% and 44% lower power and area, respectively. Further reducing the equalizer resolution reduces the power and area by a factor of 3.9× and 5.8×, respectively, when a 1-bit finite-alphabet equalizer is used.


Note that the power and area can be reduced much more. Once the number of bits in the equalization matrix has been reduced to 5 bits or below, emerging processing-in-memory architectures lower the area and power (additionally to the savings above) by about 2× to 4×.


E. Soft-Output Finite-Alphabet Equalization

In some embodiments, the finite-alphabet equalization approach described above is extended by unbiased estimation and soft-output computation. A compact expression of the post-equalization MSE is derived, which can be used to efficiently compute log-likelihood ratio (LLR) values. The effectiveness of this extension is demonstrated by error-rate simulation results for a coded massive MU-MIMO-OFDM system, for two unbiased soft-output finite-alphabet equalizers, both in LoS and non-LoS mmWave channel scenarios.


This subsection focuses on a massive MU-MIMO embodiment of the wireless communications system 18 of FIG. 2 where U UEs 28 transmit data to a radio access node 20 with an antenna array 26 having B antennas. A central task of the radio access node 20 is to generate estimates of the transmitted data vector s using the received vector y and knowledge of the channel matrix H. At the high bandwidths offered by mmWave systems, linear estimators are preferable due to their simplicity. This subsection therefore focuses on linear spatial equalizers that compute estimates s of the transmit signals s as s=WHy. Here, WHϵcustom characterU×B is a L-MMSE equalization matrix, which minimizes the MSE defined by:

MSE=custom characters,n[∥s−s∥22]  Equation 33


Under the statistical assumptions on s and n of Section I-A above, the L-MMSE equalization matrix is given by:

WH=(pIU+HHH)−1HH  Equation 34

where ρ=N0/ES. The rows wuH, u=1, . . . , U, of the L-MMSE equalizer WH can be computed by solving:

wu=arg custom charactereu−HH{tilde over (w)}∥22+ρ∥{tilde over (w)}∥22  Equation 35


Spatial equalization with a biased L-MMSE estimate for each user u=1, . . . , U amounts to computing:

su=wuHy=wuHhusu+wuHñu  Equation 36

where hu is the uth column of H and








n
~

u

=






i
=
1

,

i

u


U




h
i



s
i



+
n






is the noise-plus-interference (NPI) vector. In general, the L-MMSE equalizer has rows for which wuHhu≠1. Thus, to perform unbiased estimation, the goal is to compute the estimates for each UE u=1, . . . , U as follows:











s
^

u

=




s
¯

u



w
u
H



h
u



=




w
u
H


y



w
u
H



h
u



=


s
u

+



w
u
H




n
~

u




w
u
H



h
u










Equation





37








In general, the biased su and unbiased ŝu estimates differ: biased estimates minimize the MSE in Equation 33, whereas unbiased estimates typically achieve lower error rates.


While the discussion of subsections A-C above focus on hard-output data detection, coded communication systems benefit from spatial equalizers that compute soft-outputs. To fully exploit forward error correction, the post-equalization NPI variance is extracted and then used to generate LLR values. For the uth UE, the NPI variance is given by the MSE of the unbiased estimate ŝu, which is computed as follows:










v
u
2

=


𝔼

s
,
n









s
^

u

-

s
u




2






Equation





38















(
a
)





=







𝔼

s
,
n




[






x
u
H



H


(


I
U

-


e
u



e
u
H



)



s

+


x
u
H


n




2

]







x
u
H



h
u




2







Equation





39











=




E
s



(






H
H



x
u




2
2

-





x
u
H



h
u
H




2


)


+


N
0






x
u



2
2








x
u
H



h
u




2







Equation





40











=




E
s



h
u
H



x
u











H
H



x
u




2
2

+

ρ





x
u



2
2





x
u
H



h
u




-

E
s







Equation





41















(
b
)





=






E
s



(



(



β
u



(

x
u

)




h
u
H



x
u


)


-
1


-
1

)







Equation





42








Here, (a) follows from Equation 37 and (b) from Equation 12. Note that this result applies to any finite-alphabet equalizer as in Equation 7, as long as βu (xu) is computed as in Equation 12.


With this, soft outputs can be computed in the form of LLR values, by assuming that the residual error ŝu−su is circularly-symmetric Gaussian with variance vu2. Concretely, the LLR values are computed as follows:










Λ

u
,
q


=


log
(




s


𝒮
q

(
1
)






exp


(

-







s
^

u

-

s
u




2


v
u
2



)



)

-

log
(




s


𝒮
q

(
0
)






exp


(

-







s
^

u

-

s
u




2


v
u
2



)



)






Equation





43







Here, custom characterq(1) and custom characterq(0) are the subsets of the constellation custom character in which the qth bit is 1 and 0, respectively. Note that computing soft outputs for finite-alphabet equalizers entails the same complexity as for infinite-precision L-MMSE.


The FL-MMSE and FAME-FBS algorithms described above can be used to obtain the rows xuH of XH. For both algorithms, once xuH is known, the associated βu(xu) is computed using Equation 12; this factor is required to compute the variance νu2 using Equation 42, which is then used to compute LLR values with Equation 43.



FIGS. 9A-9C show coded BER for FL-MMSE and FAME-FBS using 1 to 3 bits per real and imaginary part for each entry of the low-resolution equalization matrix XH. The simulation results correspond to a B=256 antenna array 26, U=16 UEs 28, in a 16-QAM system with OFDM transmission over W=1200 subcarriers. Per-UE rate-¾ convolutional codes and soft-input Viterbi decoding are used as well.



FIG. 9A is a graphical representation of ¾ coded BER for soft-output finite-alphabet equalization using an i.i.d. Rayleigh-fading channel model under various equalization approaches. FIG. 9B is a graphical representation of ¾ coded BER for soft-output finite-alphabet equalization using a non-LoS QuaDRiGa mmWave channel model under various equalization approaches. FIG. 9C is a graphical representation of ¾ coded BER for soft-output finite-alphabet equalization using an LoS QuaDRiGa mmWave channel model under various equalization approaches. These figures illustrate that the coded BER performance of FAME-FBS meets or exceeds that of FL-MMSE for all of the considered scenarios. The discrepancy between these two methods decreases when increasing the number of bits used for the finite-alphabet equalization matrix. While with 1-bit, FAME-FBS offers more than 10× lower BER at 6 dB SNR compared to FL-MMSE for the non-LoS channel, the performance of the 3-bit FL-MMSE and FAME-FBS is practically the same and approaches that of the infinite-precision L-MMSE by less than 1.5 dB for all considered scenarios.


II. Finite-Alphabet Precoder (Downlink)

Similar to the case of equalization in the uplink, the power consumption and silicon area of precoding in the all-digital mmWave MU-MIMO downlink (e.g., from the core network 30 to the UEs 28 of FIG. 2) is expected to be a major bottleneck, as high-dimensional data has to be processed at extremely high rates. In order to reduce power consumption and silicon area of the precoding operation, in some embodiments the concept of finite-alphabet matrices used for linear spatial equalization is applied to linear precoding. Two finite-alphabet precoding schemes are proposed to compute Wiener filter (WF)-optimal matrices, which are matrices that best mimic the linear WF precoder. To demonstrate the effectiveness of this framework, referred to herein as FAWP, its performance is evaluated in terms of uncoded BER and EVM for i.i.d. Rayleigh-fading, and for LoS and non-LoS mmWave channels.


A. System Model

A model is presented which focuses on the downlink of a mmWave massive MU-MIMO embodiment of the wireless communications system 18 of FIG. 2 in which a radio access node 20 with an antenna array 26 having B antennas serves U<B single-antenna UEs 28 in the same time-frequency resource. A narrowband scenario is considered and modeled by y=Hx+n, where yϵcustom characterU is the received vector, Hϵcustom characterU×B is the channel matrix (e.g., received channel matrix, which may include multiple received channel matrices), xϵcustom characterB is the precoded vector, and nϵcustom characterU is i.i.d. circularly-symmetric complex Gaussian noise with variance N0 per complex entry. It is assumed that the channel matrix H is perfectly known to the radio access node 20, and that the precoded vector x is subject to the following average power constraint:

custom characterx[∥x∥22]≤P  Equation 44


The goal of precoding is to simultaneously transmit constellation points suϵcustom character to the u=1, . . . , U UEs while reducing MU interference. Here, su is assumed to have zero mean and variance Es, and custom character denotes the constellation set (e.g., 16-QAM). The radio access node 20 maps the vector s into the precoded vector x with the aid of channel state information (e.g., received from the channel estimator circuitry 34). The precoded vector x is crafted such that the UEs 28 can form an estimate ŝuϵcustom character of the transmitted symbol su simply by scaling the received signal yu. Specifically, it is assumed that each UE forms an estimate as ŝu=βyu. Here, βϵcustom character+ is a precoding factor that can be estimated at the UE using pilot-based transmission.


This section focuses on linear precoders for which it holds that x=Ps, where Pϵcustom characterB×U is the precoding matrix. Thus, some embodiments use linear precoders that attempt to minimize the MSE between the estimated symbols s and the transmitted symbols s:

MSE=custom characters,n[∥s−ŝ∥22]=custom characters,n[∥s−βHx∥22]+β2UN0  Equation 45

Minimizing Equation 45 subject to the power constraint in Equation 44 results in the so-called WF precoder, where the precoding matrix is given by







P
WF

=


1

β
WF




Q
WF







with:










Q
WF

=



(



H
H


H

+


κ
WF



I
B



)


-
1




H
H






Equation





46








κ
WF

=


UN
0

P


,


and






β
WF


=



t


r


(



(

Q
WF

)

H



Q
WF


)




E
s


P







Equation





47







It is important to realize that the matrix QWFϵcustom characterB×U the solution of the following optimization problem:

QWF=arg custom characterIU−H{tilde over (Q)}|F2WF|{tilde over (Q)}|F2  Equation 48

The columns quWFϵcustom characterB, u=1, . . . , U, of the matrix solving:

quWF=arg custom charactereu−H{tilde over (q)}∥22WF∥{tilde over (q)}∥22  Equation 49

Applying the Woodbury identity to Equation 46 yields:

QWF=HH(HHHWFIU)−1  Equation 50

which is the solution to the following optimization problem:

QWF=arg custom character∥IB−{tilde over (Q)}H∥F2WF∥{tilde over (Q)}∥F2  Equation 51

Thus, the rows qbr,WF, b=1, . . . , B, of QWF (where the superscript r denotes a row vector) can be computed as:

qbr,WF=arg custom characterebH−{tilde over (q)}rH∥22WF∥{tilde over (q)}r22  Equation 52


The alternative optimization problems in Equation 49 and Equation 52 to compute the matrix QWF will become useful in the next subsection.


B. FAWP: Finite-Alphabet WF Precoding

WF precoding computes






x
=



P
WF


s

=


1

β
WF




Q
WF


s







for each transmitted vector s. Unfortunately, digital precoding circuitry will be power hungry and large as mmWave MU-MIMO systems operate with high-dimensional data and extremely high sampling rates. As a remedy, FAWP proposes to represent the matrix QWF using coarsely quantized numbers, with the objective of reducing the hardware complexity of the matrix-vector product QWFs. Unfortunately, a direct quantization of the matrix QWF typically leads to a significant error-rate degradation.


In order to design low-resolution matrices that are WF-optimal, i.e., that best mimic the infinite-precision WF-precoding matrix QWF, finite-alphabet matrices are used, similar to those described above for spatial equalization in the mmWave MU-MIMO uplink. Since finite-alphabet matrices are applied to imitate the WF-precoding matrix QWF, they are referred to herein as FAWP matrices. FAWP matrices introduce a few high-resolution scaling factors that help to bring a low-resolution matrix to the right scale. Two distinct FAWP matrix structures, namely pre-FAWP and post-FAWP matrices, are considered here.


Pre-FAWP Matrix: A pre-FAWP matrix is defined as a B×U matrix with the structure:

Q=A diag(α*)  Equation 53

where Aϵcustom characterB×U is a low-resolution matrix with entries taken from the finite alphabet custom character and αϵcustom characterU is a vector with per-UE scaling factors.


By using a pre-FAWP matrix, the matrix-vector product Qs becomes A(diag(α*)s). Such a matrix is called pre-FAWP as the U entries of the transmitted symbol vector s are scaled by the entries of α* before getting multiplied with the matrix A. Pre-FAWP reduces hardware complexity of Qs since the matrix A has low-resolution entries. Consider, for example, the case in which the entries of A are chosen from the 1-bit alphabet custom character={±1±j}; multiplying this matrix A with the vector diag(α*)s does not require hardware multipliers, but only adders and subtractors.


To calculate pre-FAWP matrices that are WF-optimal, the problem in Equation 49 is solved by assuming that Q has the form given by Equation 53. By doing so, the following procedure is arrived at:


The problem in Equation 48 is equivalent to solving the following optimization problem for each UE u=1, . . . , U:










a
u

=



arg

min



a
~



𝒳
B










H


a
~




2
2

+


κ
WF






a
~



2
2








h
u
r



a
~




2







Equation





54








Here, au is the uth column of A, hur is the uth row of H, and the associated optimal scaling factor is given by:











α
u



(

a
u

)


=



h
u
r



a
u







Ha
u



2
2

+


κ
WF






a
u



2
2








Equation





55







Equations 54 can be established by first plugging Equation 53 into Equation 49. Then, Equation 55 is obtained by taking the Wirtinger derivative with respect to αn.


Post-FAWP Matrix: A post-FAWP matrix is defined as a B×U matrix with the structure:

Q=diag(ζ)ZH  Equation 56

where Zϵcustom characterU×B is a low-resolution matrix with entries taken from the finite alphabet custom character and ζϵcustom characterB is a vector with per-BS-antenna scaling factors.


By using a post-FAWP matrix, the matrix-vector product Qs becomes diag(ζ)(ZHs). Such a matrix is called post-FAWP as the B scaling factors in ζ are applied after multiplying the matrix ZH with the vector s. Post-FAWP reduces the hardware complexity of Qs since the B×U matrix-vector product ZHs can be implemented using exclusively low-resolution arithmetic units. The results of ZHs are then entry-wise scaled by ζ, which requires only B high-resolution scalar multiplications.


Akin to the case of pre-FAWP matrices, post-FAWP matrices are obtained that are WF-optimal by solving the problem in Equation 52 with a matrix Q that has the form given in Equation 56. By doing so, the following procedure is arrived at:


The problem in Equation 51 is equivalent to solving the following optimization problem for each BS antenna b=1, . . . , B:










z
b

=



arg

min



z
~



𝒳
U











H
H



z
~




2
2

+


κ
WF






z
~



2
2








h
b
H



z
~




2







Equation





57








Here, zb is the bth column of Z, hb is the bth column of H, and the associated optimal scaling factor is given by:











ζ
b



(

z
b

)


=



h
b
H



z
b








H
H



z
b




2
2

+


κ
WF






z
b



2
2








Equation





58







In summary, both pre-FAWP and post-FAWP matrices are composed by a low-resolution matrix and a set of scaling factors. The difference is that a pre-FAWP matrix applies its U scaling factors before the multiplication with the low-resolution matrix, whereas a post-FAWP matrix applies its B scaling factors after matrix multiplication. As Bcustom characterU in typical massive MU-MIMO systems, a pre-FAWP matrix performs fewer high-resolution scaling operations than a post-FAWP matrix. However, the matrix-vector product is simpler with a post-FAWP matrix than with a pre-FAWP matrix, since the vector has a lower resolution as the symbols in s are not scaled yet. Thus, neither pre-FAWP nor post-FAWP matrices have a clear advantage over the other in terms of hardware complexity. Nonetheless, both FAWP matrix structures are expected to reduce hardware complexity when compared to traditional precoding, as the low-resolution matrices in both structures have coarsely quantized entries.


Computing FAWP Matrices: Different methods are proposed to compute pre-FAWP and post-FAWP matrices defined in Equation 53 and Equation 56, respectively. Means to estimate the precoding factor β are also discussed.


For pre-FAWP and post-FAWP matrices, the scaling factors are computed by means of Equation 55 and Equation 58, respectively, regardless of how the low-resolution matrix (A for pre-FAWP and Z for post-FAWP) is computed. Instead of solving the problems in Equation 54 or Equation 57, a simple approach is to directly quantize the infinite-precision matrix QWF. This approach is called FAWP-WF; more specifically, pre-FAWP-WF and post-FAW P-WF when applied to pre-FAWP and post-FAWP matrices, respectively.


QWF is quantized following the method put forward in Section I. For pre-FAWP-WF, the maximum value wmax of [|custom character{quWF}|; |ℑ{quWF}|] is found for each column quWF of QWF. The range [−wmax, wmax] is then divided into uniform-width bins, where each bin is represented by its centroid value. The centroid values are scaled by the same factor so that they are integer numbers, which preserves the objective value in Equation 54 and results in the low-resolution entries of the column au. For post-FAWP-WF, the same procedure is applied on a per-row basis: each quantized row of QWF corresponds to one row of ZH.


Since the problems in Equation 54 and Equation 57 are NP-hard, FAWP-WF significantly reduces complexity. Concretely, FAWP-WF requires the same complexity of custom character(BU2) as computing the infinite-precision QWF in Equation 50. As a result, we will use FAWP-WF as a baseline to evaluate the performance of the algorithm proposed next, which tackles the problems in Equation 54 and Equation 57.


FAWP via Forward-Backward Splitting (FBS): Similar to finite-alphabet equalization matrices in Section I, the FAWP problems in Equation 54 and Equation 57 can also be approximately solved using FBS, an approach dubbed FAWP-FBS. Pre-FAWP-FBS is presented as an algorithm for computing the low-resolution part of a pre-FAWP matrix starting from the problem in Equation 54. The algorithm for post-FAWP matrices, dubbed post-FAWP-FBS, can be derived in a similar way starting from Equation 57.


As in Section I, it is assumed that the optimal objective value γu of Equation 54, u=1, . . . , U, is known. Then, solving the problem in Equation 54 is equivalent to solving the following problem:










a
u

=




arg

min



a
~



𝒳
B





1
2






H


a
˜




2
2


+



κ
WF

2






a
˜



2
2


-



γ
u

2







h
u
r



a
˜




2







Equation





59








As γu is unknown, we will use it as a parameter that can be tuned to empirically improve the performance of our algorithm.


Next, the finite-alphabet constraint ãϵcustom characterB in Equation 59 is relaxed to ãϵcustom characterB, where custom character represents the convex hull of custom character. By doing so, the all-zeros vector 0B×1 becomes a trivial solution. To avoid this solution, the term








-

δ
2







a
˜



2
2


,





with δ>0, is included in Equation 59 to encourage large entries in the vector ã. The resulting optimization problem is:










a
u

=




arg

min



a
~




B





1
2






H


a
˜




2
2


-



γ
u

2







h
u
r



a
˜




2


+




κ
WF

-
δ

2






a
˜



2
2







Equation





60







Now FBS can be applied. FBS is an efficient procedure for solving convex problems of the form â=arg minãƒ(ã)+g(ã), where both functions ƒ and g are convex, but ƒ is smooth and g is not necessarily smooth or bounded. FBS is an iterative method that runs for tmax iterations or until convergence. In each iteration t, FBS computes:

{tilde over (v)}(t+1)(t)−τ(t)∀ƒ(ã(t))  Equation 61
ã(t+1)=proxg({tilde over (v)}(t+1)(t))  Equation 62


where ∀ƒ(ã(t)) is the gradient of the function ƒ and {τ(t)>0} is a sequence of step sizes. The proximal operator of the function g is defined as








prox
g



(


v
˜

;
τ

)


=



arg

min


a
~





{


τ


g


(

a
˜

)



+


1
2







a
˜

-

v
˜




2
2



}

.






Since the problem in Equation 60 is non-convex, FBS is not guaranteed to converge to an optimal solution. Nevertheless, FBS is used to approximately solve Equation 60 by setting:










f


(
ã
)


=



1
2






H


a
˜




2
2


-



γ
u

2







h
u
r



a
˜




2







Equation





63







g


(
ã
)


=



𝕀


B




(

a
˜

)


+




κ
WF

-
δ

2






a
˜



2
2







Equation





64








where custom characterB(ã) is the indicator function, which is zero if ãϵcustom characterB and infinity otherwise. The indicator function is used to incorporate the convex constraint ãϵcustom characterB in Equation 60 into the function g(ã). These choices for ƒ(ã) and g(ã) result in:

∀ƒ(ã)=HHHã−γu(hur)Hhurã  Equation 65
prox9({tilde over (ν)})=sgn(custom character{{tilde over (ν)}})min{ν(t)|custom character{{tilde over (ν)}}|,1}+j sgn(ℑ{{tilde over (ν)}})min{ν(t)|ℑ{{tilde over (ν)}}|,1}  Equation 66

where ν(t)=(1+τ(t)WF−δ))−1 and Equation 66 is applied element-wise to {tilde over (ν)}. Pre-FAWP-FBS can be summarized as follows:
















Algorithm 2 (Pre-FAWP-FBS). Initialize ã(1) with either the maximum-



ratio transmission (MRT) solution (hru)H or the pre-FAWP-WF solution



aWFu, and fix the sets of parameters {τ(t)}, {ν(t)}, and {γ(t)}. Then, for



every iteration t = 1, . . . , tmax, compute:










 {tilde over (v)}(t+1) = (IB − τ(t)HH(IU − γ(t)eueHu)H)ã(t)
Equation 67



 ã(t+1) = proxg({tilde over (v)}(t+1))
Equation 68









The result {tilde over (x)}(tmax+1) is projected onto the finite alphabet custom character  to obtain au.



The optimal scalar au is computed using Equation 55. This procedure



is repeated for each UE u = 1, . . . , U.









To tune the algorithm parameters {τ(t)}, {ν(t)}, and {γ(t)}, some embodiments use a neural-network-based approach. Note that γu is replaced with γ(t) in Algorithm 2 in order to (i) keep the algorithm general for different user locations and (ii) to increase flexibility during optimization.


Post-FAWP-FBS is now summarized, which can be derived following similar steps as for the derivation of pre-FAWP-FBS.
















Algorithm 3 (Post-FAWP-FBS). Initialize {tilde over (z)}(1) with either the MRT



solution hb or the post-FAWP-WF solution zWFb, and fix the sets



of parameters {τ(t)}, {ν(t)}, and {γ(t)}. Then, for every iteration



t = 1, . . . , tmax, compute:










 {tilde over (v)}(t+1) = (IU − τ(t)H(IB − γ(t)ebeHb)HH){tilde over (z)}(t)
Equation 69



 {tilde over (z)}(t+1) = proxg({tilde over (v)}(t+1))
Equation 70









The result z(tmax+1)is projected onto the finite alphabet custom character  to obtain



zb. The optimal scalar ζb is computed using Equation 58. This



procedure is repeated for each BS antenna b = 1, . . . , B.









We note that both FAWP-FBS algorithms have the same complexity order of custom character(BU2) as WF and FAWP-WF.


While the BS is able to compute the precoding factor β via Equation 47 with a FAWP matrix Q instead of QWF, the UEs need to estimate such precoding factor in order to correctly estimate the transmitted symbols in s. Estimation can be achieved in a block-fading scenario by transmitting a pilot symbol that is known at the UE side. Specifically, the BS will transmit the pilot su=√{square root over (ES)}, u=1, . . . , U. Then, the uth UE will receive yu−1hurqusuu+nu, where ěu represents residual interference from the other UEs. The objective now is for the UE to find a {circumflex over (β)}uϵcustom character+ such that it generates an unbiased estimate ŝu of su, i.e., ŝu={circumflex over (β)}uyu≈su.


By taking into account that the transmitted pilot symbol su is known to be √{square root over (Es)} and by assuming that ěu+nu is zero-mean Gaussian distributed and independent of su, the UE can compute a maximum likelihood estimate (MLE) of {circumflex over (β)}u as:

{circumflex over (β)}uMLE=custom character{√{square root over (Es)}/yu}  Equation 71

While more pilots could be transmitted to form a better estimate {circumflex over (β)}uMLE, the results in Subsection C below show that one pilot is sufficient to achieve reliable downlink communication.


C. Evaluation

Simulation results for both pre-FAWP and post-FAWP matrices are generated by either FAWP-WF or FAWP-FBS. A comparison is provided in terms of BER and EVM versus normalized transmit power, defined as P/N0. For simplicity, the evaluation is restricted to a mmWave system with B=256 BS antennas serving U=16 UEs in a 16-QAM system operating over an i.i.d. Rayleigh-fading channel.



FIG. 10A is a graphical representation of uncoded BER for a finite-alphabet precoder 24 assuming perfect knowledge of the UE-side precoding factor β. Each of the FAWP-based approaches use 1-bit FAWP matrices. Pre- and post-FAWP-FBS are run for tmax=10 iterations starting from the MRT solution HH. FIG. 10A considers a case where the UEs have genie-aided access to the exact {circumflex over (β)}u precoding scaling factor. Here, both FAWP-WF approaches result in an error floor. In fact, the FAWP-WF precoders are significantly outperformed by pre-FAWP-FBS, which computes WF-optimal pre-FAWP matrices. However, post-FAWP-FBS is unable to outperform post-FAWP-WF. Hence, post-FAWP-FBS is excluded in the ensuing discussion.



FIG. 10B is a graphical representation of uncoded BER for the finite-alphabet precoder 24 with estimated UE-side precoding factor β using one pilot. This example uses the same scenario as FIG. 10A, but this time {circumflex over (β)}u, is estimated from a single pilot transmission. As illustrated, all precoders (including the infinite-precision WF) suffer from roughly a 2 dB loss. In what follows, it is assumed that {circumflex over (β)}u is estimated using a single pilot.



FIG. 11A is a graphical representation of EVM for the different FAWP precoders of FIG. 10B and a 1-bit alphabet. FIG. 11B is a graphical representation of EVM for the different FAWP precoders of FIG. 10B and a 2-bit alphabet. FIG. 11C is a graphical representation of EVM for the different FAWP precoders of FIG. 10B and a 3-bit alphabet. The dashed horizontal lines represent the EVM requirements established by the 3GPP 5G NR technical specification. For {2,3}-bit FAWP, pre-FAWP-FBS is initialized with the MRT solution HH and runs for tmax=5 iterations.



FIG. 11A confirms what was previously observed in FIGS. 10A-10B for the 1-bit alphabet. While FAWP-WF suffers a high error-floor that prevents such approach from reaching the EVM requirement even for QPSK, pre-FAWP-FBS almost meets the EVM requirement for 64-QAM. By increasing the number of bits used for the finite alphabet, the gap between the FAWP approaches and the infinite-precision WF decreases—to the point shown in FIG. 11C where all FAWP approaches meet the 64-QAM EVM requirement when using a 3-bit alphabet. It is interesting to observe that post-FAWP-WF outperforms pre-FAWP-WF when using finite alphabets with more than 1 bit. Nonetheless, post-FAWP-WF does not outperform pre-FAWP-FBS.


With reference to FIGS. 12A and 12B, FAWP is evaluated under more realistic mmWave propagation conditions. The QuaDRiGa mmWave channel model is used to simulate communication in the “mmMAGIC†_UMi” scenario when using a 60 GHz carrier frequency for both non-LoS and LoS propagation conditions. The UEs are randomly placed 10 m to 110 m away from the BS in a 120° circular sector, with a minimum angular separation of 4°. Furthermore, perfect power control is assumed, i.e., all the users receive the same signal power.



FIG. 12A is a graphical representation of uncoded BER using the non-LoS QuaDRiGa mmWave channel model for the different FAWP precoders of FIGS. 11A-11C. FIG. 12B is a graphical representation of uncoded BER using the LoS QuaDRiGa mmWave channel model for the different FAWP precoders of FIGS. 11A-11C. The trends observed in the i.i.d. Rayleigh-fading scenario are confirmed. Pre-FAWP-FBS is able to outperform both FAWP-WF precoders, although the gains of the former (as well as the gap to the WF precoder) reduce when using more bits for the finite alphabet. An interesting observation is that, for the LoS scenario illustrated in FIG. 12B, the performance of pre-FAWP-WF is on par with that of post-FAWP-WF, which was not the case for the non-LoS and i.i.d. Rayleigh-fading scenarios. Regardless, the results in FIGS. 12A and 12B demonstrate that FAWP performs well with realistic mmWave channels, such that FAWP enables low-power and area-efficient precoding circuitry.


*****


FIG. 13 is a block diagram of the radio access node 20 of FIG. 2 suitable for implementing digital beamforming of signals according to embodiments disclosed herein. The radio access node 20 includes or is implemented as a computer system 1300, which comprises any computing or electronic device capable of including firmware, hardware, and/or executing software instructions that could be used to perform any of the methods or functions described above, such as FAME equalization and/or FAWP precoding. In this regard, the computer system 1300 may be a circuit or circuits included in an electronic board card, such as a printed circuit board (PCB), a server, a personal computer, a desktop computer, a laptop computer, an array of computers, a personal digital assistant (PDA), a computing pad, a mobile device, or any other device, and may represent, for example, a server or a user's computer.


The exemplary computer system 1300 in this embodiment includes a processing device 1302 or processor, a system memory 1304, and a system bus 1306. The system memory 1304 may include non-volatile memory 1308 and volatile memory 1310. The non-volatile memory 1308 may include read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and the like. The volatile memory 1310 generally includes random-access memory (RAM) (e.g., dynamic random access memory (DRAM), such as synchronous DRAM (SDRAM)). A basic input/output system (BIOS) 1312 may be stored in the non-volatile memory 1308 and can include the basic routines that help to transfer information between elements within the computer system 1300.


The system bus 1306 provides an interface for system components including, but not limited to, the system memory 1304 and the processing device 1302. The system bus 1306 may be any of several types of bus structures that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and/or a local bus using any of a variety of commercially available bus architectures.


The processing device 1302 represents one or more commercially available or proprietary general-purpose processing devices, such as a microprocessor, central processing unit (CPU), or the like. More particularly, the processing device 1302 may be a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a processor implementing other instruction sets, or other processors implementing a combination of instruction sets. The processing device 1302 is configured to execute processing logic instructions for performing the operations and steps discussed herein.


In this regard, the various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with the processing device 1302, which may be a microprocessor, field programmable gate array (FPGA), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), or other programmable logic device, a discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. Furthermore, the processing device 1302 may be a microprocessor, or may be any conventional processor, controller, microcontroller, or state machine. The processing device 1302 may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration).


The computer system 1300 may further include or be coupled to a non-transitory computer-readable storage medium, such as a storage device 1314, which may represent an internal or external hard disk drive (HDD), flash memory, or the like. The storage device 1314 and other drives associated with computer-readable media and computer-usable media may provide non-volatile storage of data, data structures, computer-executable instructions, and the like. Although the description of computer-readable media above refers to an HDD, it should be appreciated that other types of media that are readable by a computer, such as optical disks, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used in the operating environment, and, further, that any such media may contain computer-executable instructions for performing novel methods of the disclosed embodiments.


An operating system 1316 and any number of program modules 1318 or other applications can be stored in the volatile memory 1310, wherein the program modules 1318 represent a wide array of computer-executable instructions corresponding to programs, applications, functions, and the like that may implement the functionality described herein in whole or in part, such as through instructions 1320 on the processing device 1302. The program modules 1318 may also reside on the storage mechanism provided by the storage device 1314. As such, all or a portion of the functionality described herein may be implemented as a computer program product stored on a transitory or non-transitory computer-usable or computer-readable storage medium, such as the storage device 1314, volatile memory 1308, non-volatile memory 1310, instructions 1320, and the like. The computer program product includes complex programming instructions, such as complex computer-readable program code, to cause the processing device 1302 to carry out the steps necessary to implement the functions described herein.


An operator, such as the user, may also be able to enter one or more configuration commands to the computer system 1300 through a keyboard, a pointing device such as a mouse, or a touch-sensitive surface, such as the display device, via an input device interface 1322 or remotely through a web interface, terminal program, or the like via a communication interface 1324. The communication interface 1324 may be wired or wireless and facilitate communications with any number of devices via a wireless communications system 18 in a direct or indirect fashion. An output device, such as a display device, can be coupled to the system bus 1306 and driven by a video port 1326. Additional inputs and outputs to the computer system 1300 may be provided through the system bus 1306 as appropriate to implement embodiments described herein.


The operational steps described in any of the exemplary embodiments herein are described to provide examples and discussion. The operations described may be performed in numerous different sequences other than the illustrated sequences. Furthermore, operations described in a single operational step may actually be performed in a number of different steps. Additionally, one or more operational steps discussed in the exemplary embodiments may be combined.


Those skilled in the art will recognize improvements and modifications to the preferred embodiments of the present disclosure. All such improvements and modifications are considered within the scope of the concepts disclosed herein and the claims that follow.

Claims
  • 1. A method for digitally beamforming signals for an antenna array, the method comprising: estimating a wireless channel associated with a plurality of digital baseband signals and an antenna array to produce estimates of the wireless channel; andbeamforming, with a finite-alphabet equalizer, the plurality of digital baseband signals based on the estimates of the wireless channel;wherein the plurality of digital baseband signals is beamformed at a given resolution less than a resolution of the estimates of the wireless channel.
  • 2. The method of claim 1, further comprising determining digital beamforming weights which are optimized for the given resolution; wherein beamforming the plurality of digital baseband signals comprises steering the plurality of digital baseband signals with the digital beamforming weights.
  • 3. The method of claim 2, further comprising adjusting the digital beamforming weights in response to new estimates of the wireless channel.
  • 4. The method of claim 1, further comprising receiving a sequential array of digital signal samples corresponding to the antenna array and comprising the plurality of digital baseband signals.
  • 5. The method of claim 4, wherein the sequential array of digital signal samples is received from analog-to-digital converter circuitry coupled to the antenna array.
  • 6. The method of claim 4, wherein beamforming the plurality of digital baseband signals comprises recovering received user equipment signals from the sequential array of digital signal samples.
  • 7. The method of claim 1, further comprising providing a sequential array of digital signal samples comprising the plurality of digital baseband signals to be transmitted via the antenna array.
  • 8. The method of claim 7, wherein providing the sequential array of digital signal samples comprises providing the sequential array of digital signal samples to digital-to-analog converter circuitry coupled to the antenna array.
  • 9. The method of claim 7, wherein beamforming the plurality of digital baseband signals comprises precoding each of the plurality of digital baseband signals by weighting the sequential array of digital signal samples.
  • 10. The method of claim 1, wherein the plurality of digital baseband signals is represented by a received channel matrix.
  • 11. The method of claim 10, wherein beamforming the plurality of digital baseband signals comprises performing a vector product of the received channel matrix and an equalization matrix.
  • 12. The method of claim 11, wherein: estimating the wireless channel comprises providing a channel matrix; andthe method further comprises determining the equalization matrix based on the channel matrix.
  • 13. The method of claim 12, further comprising adjusting the equalization matrix to minimize a post-equalization mean square error of the plurality of digital baseband signals.
  • 14. A radio access node, comprising: an antenna array;channel estimator circuitry coupled to the antenna array and configured to provide an estimate of a wireless channel associated with a plurality of digital baseband signals for the antenna array;finite-alphabet beamforming circuitry coupled to the antenna array and the channel estimator circuitry, wherein: the finite-alphabet beamforming circuitry is configured to beamform the plurality of digital baseband signals in accordance with the estimate of the wireless channel; andthe finite-alphabet beamforming circuitry comprises a finite-alphabet precoder configured to beamform a sequential array of digital signal samples; andprocessing circuitry coupled to the finite-alphabet precoder and configured to determine a precoder matrix applied to the sequential array of digital signal samples.
  • 15. The radio access node of claim 14, wherein: the finite-alphabet beamforming circuitry comprises a finite-alphabet equalizer configured to beamform the sequential array of digital signal samples; andthe processing circuitry is coupled to the finite-alphabet equalizer and configured to determine an equalization matrix applied to the sequential array of digital signal samples.
  • 16. The radio access node of claim 15, further comprising analog-to-digital circuitry configured to provide the sequential array of digital signal samples corresponding to antennas of the antenna array.
  • 17. The radio access node of claim 14, further comprising digital-to-analog circuitry configured to provide analog signals to be transmitted by the antenna array from the sequential array of digital signal samples.
  • 18. The radio access node of claim 14, wherein the finite-alphabet beamforming circuitry has a resolution of 8 bits or less.
  • 19. The radio access node of claim 18, wherein the finite-alphabet beamforming circuitry has a resolution of 3 bits or less.
  • 20. The radio access node of claim 14, wherein the antenna array comprises at least 64 antennas.
  • 21. A radio access node, comprising: an antenna array;channel estimator circuitry coupled to the antenna array and configured to provide an estimate of a wireless channel associated with a plurality of digital baseband signals for the antenna array; andfinite-alphabet beamforming circuitry coupled to the antenna array and the channel estimator circuitry, the finite-alphabet beamforming circuitry being configured to beamform the plurality of digital baseband signals in accordance with the estimate of the wireless channel and at a given resolution less than a resolution of the estimate of the wireless channel.
GOVERNMENT SUPPORT

This invention was made with government funds under Agreement No. HR0011-18-3-0004 awarded by The Defense Advanced Research Projects Agency (DARPA). The U.S. Government has certain rights in this invention.

US Referenced Citations (5)
Number Name Date Kind
10812125 Badic Oct 2020 B1
20050024540 Kim Feb 2005 A1
20060269023 Chimitt Nov 2006 A1
20080235311 Budianu Sep 2008 A1
20130114001 Kweon May 2013 A1
Non-Patent Literature Citations (68)
Entry
Terry et al. “Convergence Analysis of Finite Alphabet Beamformers for Digital Cochannel Signals”, IEEE Transactions on Communications, vol. 51, No. 6, Jun. 2003 (Year: 2003).
Wang et al. “Hybrid Precoderand Combiner Design With Low-Resolution Phase Shifters in mmWave MIMO Systems”, IEEE Journal of Selected Topics in Signal Processing, vol. 12, No. 2, May 2018 (Year: 2018).
Ando, K. et al., “BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W,” IEEE Journal of Solid-State Circuits (JSSC), vol. 53, No. 4, Apr. 2018, IEEE, pp. 983-994.
Andoni, A. et al., “Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimension,” Communications of the ACM, vol. 51, No. 1, Jan. 2008, pp. 117-122.
Annovi, A. et al., “Characterization of an Associative Memory Chip in 28 nm CMOS Technology,” 2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 27-30, 2018, Florence, Italy, 5 pages.
Aga, S. et al., “Compute Caches,” 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb. 4-8, 2017, Austin, TX, USA, IEEE, 12 pages.
Arikan, E., “Channel Polarization: A Method for Constructing Capacity-Achieving Codes,” IEEE International Symposium on Information Theory (ISIT), Jul. 6-11, 2008, Toronto, Canada, IEEE, pp. 1173-1177.
Bankman, D. et al., “An always-on 3.8 J/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS,” IEEE Journal of Solid-State Circuits, Feb. 11-15, 2018, San Francisco, CA, USA, pp. 222-224.
Conti, F. et al., “XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, Issue 11, Nov. 2018, IEEE, 11 pages.
Daemen, J. et al., “The Design of Rijndael: AES—The Advanced Encryption Standard,” Springer-Verlag Berlin Heidelberg GmbH, 2002, 243 pages.
Eckert, C. et al., “Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks,” IEEE Micro, vol. 39, Issue 3, May-Jun. 2019, IEEE, 6 pages.
Goldstein, T. et al., “The Stone Transform: Multi-Resolution ImageEnhancement and Compressive Video,” IEEE Transactions on Image Processing, vol. 24, Issue 12, Dec. 2015, IEEE, 13 pages.
Hubara, I. et al., “Binarized Neural Networks,” Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS 2016), Dec. 2016, Barcelona, Spain, 9 pages.
Kim, D. et al., “Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory,” ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Jun. 18-22, 2016, Seoul, South Korea, IEEE Computer Society, pp. 380-392.
Lee J. et al., “UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision,” IEEE Journal of Solid-State Circuits, vol. 54, Issue 1, Jan. 2019, pp. 173-185.
Pagiamtzis, K. et al., “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE Journal of Solid-State Circuits, vol. 41, No. 3, Mar. 2006, IEEE, pp. 712-727.
Pei, T.-B. et al., “VLSI Implementation of Routing Tables: Tries and CAMs,” Conference on Computer Communications, Tenth Annual Joint Conference of the IEEE Computer and Communications Societies Proceedings, Apr. 7-11, 1991, Bal Harbour, FL, USA, IEEE, pp. 0515-0524.
Zhang, M. et al., “Highly-Associative Caches for Low-Power Processors,” Kool Chips Workshop, 33rd International Symposium on Microarchitecture, Dec. 2000, Monterey, CA, 6 pages.
Agrell, E., et al., “Closest point search in lattices,” IEEE Transactions on Information Theory, vol. 48, No. 8, Aug. 2002, pp. 2201-2214.
Alkhateeb, A., et al., “MIMO precoding and combining solutions for millimeter-wave systems,” IEEE Communications Magazine, vol. 52, No. 12, Dec. 2014, pp. 122-131.
Author Unknown, “5G; NR; Base Station (BS) radio transmission and reception,” 3GPP Technical Specification 38.104, version 15.5.0, release 15, European Telecommunications Standards Institute, May 2019, 219 pages.
Balatsoukas-Stimming, A., et al., “Neural-network optimized 1-bit precoding for massive MU-MIMO,” 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications, Cannes, France, Jul. 2-5, 2019, 5 pages.
Beck, A., et al., “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” Society for Industrial and Applied Mathematics Journal on Imaging Sciences, vol. 2, No. 1, Jan. 2009, pp. 183-202.
Björnson, E., et al., “Massive MIMO in sub-6 GHz and mmWave: Physical, practical, and use-case differences,” IEEE Wireless Communications Magazine, vol. 26, No. 2, Apr. 2019, pp. 100-108.
Castañeda, O., et al., “1-bit massive MU-MIMO precoding in VLSI,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 7, No. 4, Dec. 2017, pp. 508-522.
Castañeda, O., et al., “Data detection in large multi-antenna wireless systems via approximate semidefinite relaxation,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 63, No. 12, Dec. 2016, pp. 2334-2346.
Castañeda, O., et al., “PPAC: A versatile in-memory accelerator for matrix-vector-product-like operations,” 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors, New York, NY, Jul. 15-17, 2019, 8 pages.
Castañeda, O., et al., “VLSI design of a 3-bit constant-modulus precoder for massive MU-MIMO,” 2018 IEEE International Symposium on Circuits and Systems, Florence, Italy, May 4, 2018, 5 pages.
Chang, T.-H., et al., “A linear fractional semidefinite relaxation approach to maximum-likelihood detection of higher-order QAM OSTBC in unknown channels,” IEEE Transactions on Signal Processing, vol. 58, No. 4, Apr. 2010, pp. 2315-2326.
Dutta, S., et al., “A case for digital beamforming at mmWave,” Jan. 24, 2019, https://arxiv.org/pdf/1901.08693.pdf, 30 pages.
Fatema, N., et al., “Massive MIMO linear precoding: A survey,” IEEE Systems Journal, vol. 12, No. 4, Dec. 2018, pp. 3920-3931.
Fincke, U., et al., “Improved methods for calculating vectors of short length in a lattic, including a complexity analysis,” Mathematics of Computation, vol. 44, No. 170, Apr. 1985, pp. 463-471.
Goldstein, T., et al., “A field guide to forward-backward splitting with a FASTA implementation,” https://arxiv.org/pdf/1411.3406v1.pdf, Nov. 12, 2014, 17 pages.
Goldstein, T., et al., “High-order methods for basis pursuit,” UCLA Computational Applied Mathematics (CAM) Report, Jul. 2010, 17 pages.
Guo, Q., et al., “AC-DIMM: Associative computing with STT-MRAM,” ACM SIGARCH Computer Architecture News, vol. 41, No. 3, Jun. 2013, pp. 189-200.
Jacobsson, S. et al., “Nonlinear 1-bit precoding for massive MU-MIMO with higher order modulation,” 2016 50th Asilomar Conference on Signals, Systems and Computers, Nov. 6-9, 2018, Pacific Grove, CA, pp. 763-767.
Jacobsson, S., et al., “Quantized precoding for massive MU-MIMO,” IEEE Transactions on Communications, vol. 65, No. 11, Nov. 2017, pp. 4670-4684.
Jacobsson, S., et al., “Throughput analysis of massive MIMO uplink with low-resolution ADCs,” IEEE Transactions on Wireless Communications, vol. 16, No. 6, Jun. 2017, pp. 4038-4051.
Jaeckel, S., et al., “QuaDRiGa: A 3-D multi-cell channel model with time evolution for enabling virtual field trials,” IEEE Transactions on Antennas and Propagation, vol. 62, No. 6, Jun. 2014, pp. 3242-3256.
Jia, H., et al., “A microprocessor implemented in 65nm CMOS with configurable and bit-scalable accelerator for programmable in-memory computing,” https://arxiv.org/ftp/arxiv/papers/1811/1811.04047.pdf, Nov. 2018, 10 pages.
Joham, M., et al., “Linear transmit processing in MIMO communications systems,” IEEE Transactions on Signal Processing, vol. 53, No. 8, Aug. 2005, pp. 2700-2712.
Larsson, E.G., et al., “Massive MIMO for next generation wireless systems,” IEEE Communications Magazine, vol. 52, No. 2, Feb. 2014, pp. 186-195.
Li, S., et al., “DRISA: A DRAM-based reconfigurable in-situ accelerator,” 5-th Annual IEEE/ACM Sympoium on Microartchitecture, Oct. 14-18, 2017, Cambridge, MA, pp. 288-301.
Luo, Z., et al., “Semidefinite relaxation of quadratic optimization problems,” IEEE Signal Processing Magazine, vo. 27, No. 3, May 2010, pp. 20-34.
Mo, J., et al., “Capacity analysis of one-bit quantized MIMO systems with transmitter channel state information,” IEEE Transactions on Signal Processing, vol. 63, No. 20, Oct. 15, 2015, pp. 5498-5512.
Mo. J., et al., “Hybrid architectures with few-bit ADC receivers: Achievable rates and energy rate tradeoffs,” IEEE Transactions on Wireless Communications, vol. 16, No. 4, Apr. 2017, pp. 2274-2287.
Nair, R., “Evolution of memory architecture,” Proceedings of the IEEE, vol. 103, No. 8, Aug. 2015, pp. 1331-1345.
Parikh, N., et al., “Proximal algorithms,” Foundations and Trends in Optimization, vol. 1, No. 3, 2013, pp. 123-231.
Paulraj, A., et al., “Introduction to space-time wireless communications,” Cambridge University Press, May 29, 2003, 277 pages.
Petersen, K. B., et al., “The matrix cookbook,” Nov. 15, 2012, 72 pages.
Pi, Z., et al., “An introduction to millimeter-wave mobile broadband systems,” IEEE Communications Magazine, vol. 49, No. 6, Jun. 2011, pp. 101-107.
Rappaport, T.S., et al., “Millimeter wave mobile communications for 5G cellular: It will work!,” IEEE Access, vol. 1, May 2013, pp. 335-349.
Roh, W., et al., “Millimeter-wave beamforming as an enabling technology for 5G cellular communications: Theoretical feasibility and prototype results,” IEEE Communications Magazine, vol. 52, No. 2, Feb. 2014, pp. 106-113.
Roth, K., et al., “Achievable rate and energy efficiency of hybrid and digital beamforming receivers with low resolution ADC,” IEEE Journal on Selected Areas in Communications, vol. 35, No. 9, Sep. 2017, pp. 2056-2068.
Rusek, F., et al., “Scaling up MIMO: Opportunities and challenges with very large arrays,” IEEE Signal Process Magazine, vol. 30, No. 1, Jan. 2013, pp. 40-60.
Sadhu, B., et al., “A 28GHz 32-element phased-array transceiver IC with concurrent dual polarized beams and 1.4 degree beam-steering resolution for 5G communication,” 2017 IEEE International Solid-State Circuits Conference, Feb. 2017, pp. 128-129.
Shah, S., et al., “Biconvex relaxation for semidefinite programming in computer vision,” European Conference on Computer Vision (ECCV), Sep. 2016, pp. 717-735.
Studer, C. et al., “ASIC implementation of soft-input soft-output MIMO detection using MMSE parallel interference cancellation,” IEEE Journal of Solid-State Circuits, vol. 46, No. 7, Jul. 2011, pp. 1754-1765.
Studer, C., et al., “Quantized massive MU-MIMO-OFDM uplink,” IEEE Transactions on Communications, vol. 64, No. 6, Jun. 2016, pp. 2387-2399.
Swindlehurst, A.L., et al., “Millimeterwave massive MIMO: The next wireless revolution?,” IEEE Communications Magazine, vol. 52, No. 9, Sep. 2014, pp. 56-62.
Tse, D., et al., “Fundamentals of Wireless Communication,” Cambridge University Press, Jun. 27, 2005, 586 pages.
Tuchler, M., et al.,“Minimum mean squared error equalization using a priori information,” IEEE Transactions on Signal Processing, vol. 50, No. 3, Mar. 2002, pp. 673-683.
Verdu, S., “Computational complexity of optimum multiuser detection,” Algorithmica, vol. 4, No. 1, Jun. 1989, pp. 303-312.
Wu, M., et al., “High-throughput data detection for massive MU-MIMO-OFDM using coordinate descent,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 63, No. 12, Dec. 2016, pp. 2357-2367.
Wu, M., et al., “Large-scale MIMO detection for 3GPP LTE: Algorithm and FPGA implementation,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, No. 5, Oct. 2014, pp. 916-929.
Wulf, W. A., et al., “Hitting the memory wall: Implications of the obvious,”ACM SIGARCH Computer Architecture News, vol. 23, No. 1, Mar. 1995, pp. 20-24.
Yan, H., et al., “Performance, power, and area design trade-offs in millimeter-wave transmitter beamforming architectures,” IEEE Circuits and Systems Magazine, vo. 19, No. 2, May 2019, pp. 33-58.
Zimmerman, R., “Computer arithmetic: Principles, architectures, and VLSI design,” Integrated Systems Laboratory, Swiss Federal Institute of Technology (ETH), Zürich, Switzerland, Mar. 16, 1999, 26 pages.