The present disclosure relates to equalization of radio frequency (RF) signals in multi-antenna RF systems and devices.
Future wireless systems are expected to deliver higher data-rates within the already crowded radio frequency (RF) spectrum. Emerging technologies, such as millimeter-wave (mmWave) or terahertz (THz) communication and massive multi-user multiple-input multiple-output (MU-MIMO), have risen as promising candidates to provide such high data rates. An abundance of available bandwidth at mmWave and THz frequencies, combined with fine-grained beamforming capabilities provided by massive MU-MIMO, can enable high-throughput communication to multiple user equipments (UEs) in a same time-frequency resource. However, these technologies require hundreds of antennas at a base station (BS), each receiving or transmitting wideband signals, which necessitates sophisticated RF and digital baseband processing circuitry. As a result, circuit power consumption and system costs may increase significantly, which may hamper the success of this technology.
To reduce power consumption, the literature has largely focused on multi-antenna mmWave and THz architectures that rely on hybrid analog-digital solutions. Albeit energy efficient, such architectures have limited multiplexing capabilities as they are only capable of simultaneously combining signals coming from a restricted number of directions; this key limitation may result in a reduced spectral efficiency. An emerging alternative is the use of all-digital BS architectures. While it is commonly believed that all-digital BS designs would be energy inefficient, it has been shown recently that the power consumption of the RF and data-conversion elements in an all-digital BS is comparable to that of hybrid solutions, provided that the resolution of the data converters at the BS is suitably reduced. However, power consumption and system costs of baseband processing for all-digital BS architectures are largely unexplored.
Unfortunately, for a BS 14 with a large array of antennas 16 serving multiple UEs 12, a conventional matrix-vector-product circuit operating consumes a large amount of power. For example, a BS 14 with B=256 antennas 16 serving U=16 UEs 12 which uses a conventional processing circuit at 2 G vectors/second consumes over 28 watts (W) and occupies more than 128 square millimeters (mm2) when implemented in a 28 nanometer (nm) complementary metal-oxide-semiconductor (CMOS). If more BS antennas 16 and/or more UEs 12 are considered, circuit power and area increase even further.
Finite-alphabet beamforming for multi-antenna wideband systems is provided. The combination of massive multi-user multiple-input multiple-output (MU-MIMO) technology and millimeter-wave (mmWave) communication enables unprecedentedly high data rates for radio frequency (RF) communications. In such systems, beamforming must be performed at extremely high rates over hundreds of antennas. For example, spatial equalization applies beamforming in the uplink to mitigate interference among user equipment (UEs) at a base station (BS). Conventional hardware designs of spatial equalizers in all-digital BSs, where each antenna is equipped with a pair of data converters, would entail prohibitively high power consumption and implementation costs. To address these issues, finite-alphabet equalization provides a new paradigm that restricts the entries of a spatial equalization matrix to low-resolution numbers, enabling high-throughput, low-power, and low-cost equalization hardware.
Similarly, precoding applies beamforming in the downlink to maximize the reception of a signal transmitted from a BS to a target UE. Finite-alphabet precoding can be applied in the downlink to similarly improve power and cost in precoding hardware.
To minimize the performance loss of finite-alphabet equalization, embodiments generate an equalization matrix using finite-alphabet minimum mean-square error (MMSE) equalization (FAME), which significantly outperforms a nave quantization of a linear MMSE matrix. A similar approach is applied to replace part of linear Wiener filter (WF) precoding matrices with a finite-alphabet WF precoding (FAWP) matrix for the downlink. Efficient algorithms can be deployed to quantize solutions to a non-deterministic polynomial-time (NP)-hard FAME/FAWP problem defined herein. Through this, it is shown that for massive MU-MIMO mmWave systems, near-optimal error-rate performance can be achieved with equalization coefficients quantized to only 1-3 bits. In addition, very-large scale integration (VLSI) results demonstrate a reduction in equalization power and area by at least a factor of 3.9× and 5.8×, respectively, over traditional approaches.
An exemplary embodiment provides a method for digitally beamforming signals for an antenna array. The method includes estimating a wireless channel associated with a plurality of digital baseband signals and an antenna array to produce estimates for the wireless channel. The method further includes beamforming, with a finite-alphabet equalizer, the plurality of digital baseband signals based on the estimates of the wireless channel.
Another exemplary embodiment provides a radio access node. The radio access node includes an antenna array and channel estimator circuitry coupled to the antenna array and configured to provide an estimate of a wireless channel associated with a plurality of digital baseband signals for the antenna array. The radio access node further includes finite-alphabet beamforming circuitry coupled to the antenna array and the channel estimator circuitry, the finite-alphabet beamforming circuitry being configured to beamform the plurality of digital baseband signals in accordance with the estimation of the wireless channel.
Those skilled in the art will appreciate the scope of the present disclosure and realize additional aspects thereof after reading the following detailed description of the preferred embodiments in association with the accompanying drawing figures.
The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the disclosure, and together with the description serve to explain the principles of the disclosure.
The embodiments set forth below represent the necessary information to enable those skilled in the art to practice the embodiments and illustrate the best mode of practicing the embodiments. Upon reading the following description in light of the accompanying drawing figures, those skilled in the art will understand the concepts of the disclosure and will recognize applications of these concepts not particularly addressed herein. It should be understood that these concepts and applications fall within the scope of the disclosure and the accompanying claims.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
It will be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected” or “directly coupled” to another element, there are no intervening elements present.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including” when used herein specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Finite-alphabet beamforming for multi-antenna wideband systems is provided. The combination of massive multi-user multiple-input multiple-output (MU-MIMO) technology and millimeter-wave (mmWave) communication enables unprecedentedly high data rates for radio frequency (RF) communications. In such systems, beamforming must be performed at extremely high rates over hundreds of antennas. For example, spatial equalization applies beamforming in the uplink to mitigate interference among user equipment (UEs) at a base station (BS). Conventional hardware designs of spatial equalizers in all-digital BSs, where each antenna is equipped with a pair of data converters, would entail prohibitively high power consumption and implementation costs. To address these issues, finite-alphabet equalization provides a new paradigm that restricts the entries of a spatial equalization matrix to low-resolution numbers, enabling high-throughput, low-power, and low-cost equalization hardware.
Similarly, precoding applies beamforming in the downlink to maximize the reception of a signal transmitted from a BS to a target UE. Finite-alphabet precoding can be applied in the downlink to similarly improve power and cost in precoding hardware.
To minimize the performance loss of finite-alphabet equalization, embodiments generate an equalization matrix using finite-alphabet minimum mean-square error (MMSE) equalization (FAME), which significantly outperforms a nave quantization of a linear MMSE matrix. A similar approach is applied to replace part of linear Wiener filter (WF) precoding matrices with a finite-alphabet WF precoding (FAWP) matrix for the downlink. Efficient algorithms can be deployed to quantize solutions to a non-deterministic polynomial-time (NP)-hard FAME/FAWP problem defined herein. Through this, it is shown that for massive MU-MIMO mmWave systems, near-optimal error-rate performance can be achieved with equalization coefficients quantized to only 1-3 bits. In addition, very-large scale integration (VLSI) results demonstrate a reduction in equalization power and area by at least a factor of 3.9× and 5.8×, respectively, over traditional approaches.
In an exemplary aspect, the radio access node 20 includes an antenna array 26 (e.g., having B antennas), which can be operated for MU-MIMO communication with multiple UEs 28 (e.g., U UEs 28). Accordingly, the radio access node 20 is configured to perform spatial equalization in uplink communications (e.g., communications from the UEs 28 to a core network 30 of the wireless communications system 18). The purpose of spatial equalization is to beamform received signals by collecting the signals from all U UEs 28 at B antennas of the antenna array 26, while suppressing inter-UE interference. After this beamforming, these signals may be further processed via baseband processing circuitry 32 and forwarded to the core network 30. In some embodiments, the radio access node 20 is further configured to perform precoding in downlink communications (e.g., from the core network 30 to the UEs 28). Precoding similarly beamforms signals to be transmitted to the UEs 28 to maximize the reception of a transmitted signal at its target UE 28.
In an exemplary aspect, the antenna array 26 of the radio access node 20 includes a large number of B antennas (e.g., at least B=64 antennas). As described above, for such a large antenna array 26, a conventional matrix-vector-product circuit consumes a large amount of power. Consequently, the finite-alphabet beamforming circuitry (e.g., the finite-alphabet equalizer 22 and/or the finite-alphabet precoder 24) provides more efficient spatial equalization in order to minimize power consumption and semiconductor area (which translate to system costs), while achieving high spectral efficiency.
The finite-alphabet equalizer 22 and the finite-alphabet precoder 24 beamform digital baseband signals based on a wireless channel estimation provided by channel estimator circuitry 34 using matrix-vector products. The matrix-vector products required for spatial equalization and precoding involve multiplications and additions, where the hardware multipliers dominate power and area. The area and delay of a hardware multiplier scales with O(mn) and O(log(max{m, n})), respectively, where m and n are the number of bits of each operand. Therefore, circuit area, delay, and power consumption (which is roughly proportional to circuit area) of a matrix-vector-product engine can be minimized by using a low number of bits to represent both operands.
In this regard, Section I below focuses on application of finite-alphabet equalization to uplink communications using the finite-alphabet equalizer 22. Section II adapts the finite-alphabet equalization approach to downlink communications using the finite-alphabet precoder 24.
Matrices and column vectors are represented with uppercase and lowercase boldface letters, respectively. The Hermitian transpose and the Frobenius norm of a matrix A are denoted by AH and ∥A∥F, respectively. The real part of a complex-valued matrix A is {A} and the imaginary part is ℑ{A}. The M×M identity matrix is denoted by IM. The kth entry and the 2-norm of a vector a are αk and ∥a∥2, respectively; the entry-wise complex conjugate is denoted by a*. The kth standard basis vector is represented by ek. The signum function sgn(⋅) is defined as sgn(a)=+1 for a≥0 and sgn(a)=−1 for a<0 and is applied entry-wise to vectors. x[⋅] is used to denote expectation with respect to the random vector x. The set + contains all positive semidefinite matrices, and the set + contains all the non-negative real numbers.
With continuing reference to
To reduce power consumption and implementation costs of spatial equalization, the finite-alphabet equalizer 22 coarsely quantizes the coefficients of one or more spatial equalization matrices. In contrast to approaches that use low-resolution ADCs to quantize the received vector to be equalized, finite-alphabet equalization coarsely quantizes the entries of the spatial equalization matrix. While this approach appears straightforward, obtaining low-resolution finite-alphabet equalization matrices that achieve high spectral efficiency is a hard problem.
The finite-alphabet equalizer 22 uses a specific finite-alphabet equalization-matrix structure that enables one to reduce the complexity of a U×B matrix-vector product by using U×B low-resolution coefficients, while still being able to deliver a performance similar to conventional, high-resolution spatial equalization matrices, as further described in subsection A below. The so-called FAME problem is described in subsection B below, whose solution leads to finite-alphabet equalization matrices that minimize the post-equalization mean-square error (MSE). In subsection C, a range of algorithms are presented that approximate the NP-hard FAME problem some of these algorithms achieve excellent performance even for 1-bit resolution; some require very low complexity.
In subsection D, the finite-alphabet equalization approach is evaluated for line-of-sight (LoS) and non-LoS mmWave channel models, which demonstrate the efficacy of FAME in terms of error-vector magnitude (EVM), beamforming capabilities, and uncoded BER. In addition, finite-alphabet equalization circuits are implemented for different numbers of bits in a 28 nanometer (nm) complementary metal-oxide-semiconductor (CMOS) to demonstrate the effectiveness of FAME in practice.
With continuing reference to
Focusing on the uplink of the wireless communications system 18 (e.g., a massive MU-MIMO system), the following narrowband input-output relation can be defined:
y=Hs+n Equation 1
Here, yϵB is the received signal vector at the radio access node 20, HϵB×U is the known uplink MIMO channel matrix, sϵU is the transmit data vector, where is the constellation set (e.g., 16-quadrature amplitude modulation (QAM)), and nϵB is independent and identically distributed (i.i.d.) circularly-symmetric complex Gaussian noise with covariance matrix Cn=n[nnH]=N0IB per complex entry. In what follows, it is assumed that the transmit signals of the UEs 28, su, u=1, . . . , U, are i.i.d. zero mean with variance Es so that Cs=s[ssH]=EsIU.
It should be understood that the input-output relation in Equation 1 is not only valid to model narrowband transmission, but can also be used to model the subcarriers of a wideband massive MU-MIMO system that uses orthogonal frequency-division multiplexing (OFDM) or single-carrier frequency-division multiple access (SC-FDMA). The theory and algorithms developed in the remainder of this disclosure can be generalized for systems with inter-symbol interference.
For this model, it is assumed that the channel remains constant over multiple symbol transmissions and, hence, can be estimated. For the mathematical derivations, it is further assumed that perfect channel state information is quantized at the radio access node 20. For systems in which the UEs 28 use an antenna array to perform transmit beamforming, the channel matrix H represents the joint effect of beamforming and the physical channel.
A key task of the radio access node 20 is to form estimates ŝϵU of the transmitted data vector s. To develop methods that are computationally efficient and hardware friendly, embodiments focus on linear spatial estimators of the form ŝ=WHy where WHϵU×B is the L-MMSE equalization matrix that minimizes the post-equalization MSE defined as:
MSE=s,n[∥WHy−s∥22] Equation 2
Given the assumptions on the statistics of the transmit data and noise vectors, s and n:
MSE=s∥WHH−IU∥F2+N0∥WH∥F2 Equation 3
Hence, the L-MMSE equalization matrix can be obtained by solving the following matrix least-squares problem:
WH=arg ∥IU−{tilde over (W)}HH∥F2+ρ∥{tilde over (W)}H∥F2 Equation 4
with regularization parameter ρ=N0/Es. This optimization problem has a closed-form solution given by:
WH=(ρIU+HHH)−1HH Equation 5
which can be computed efficiently in hardware.
Alternatively, the rows wuH, u=1, . . . , U, of the L-MMSE equalization matrix WH can be computed by solving:
wu=arg ∥eu−HH{tilde over (w)}∥22+ρ∥{tilde over (w)}∥22 Equation 6
This alternative formulation of the L-MMSE equalizer is useful in solving the FAME problem, as detailed in the next subsection.
Embodiments of the finite-alphabet equalizer 22 of
Linear equalization in hardware requires the computation of an inner product ŝu=wu, y=wuHy per UE for every received vector y. As described above, executing even such simple computations at the bandwidth offered by mmWave systems can result in excessively large area and high power consumption. To reduce both the area and power consumption, embodiments of the finite-alphabet equalizer 22 reduce the numerical precision of the equalization vectors wu, u=1, . . . , U. In the extreme case where the entries of wu are quantized using 1-bit per real and imaginary component, an inner-product computation would only require additions and subtractions; this is significantly less costly (in area and power) than using high-precision multipliers.
However, it is obvious that reducing the numerical precision of the equalization vectors wu will affect the MSE and eventually the error-rate performance. Furthermore, quantization to, e.g., the finite alphabet ={+1+j, +1−j, −1+j, −1−j}, will result in numerical-range issues, meaning that such matrices will not be able to represent large or small entries. To mitigate both of these issues, a principled way to perform equalization is developed with finite-alphabet matrices.
A U×B finite-alphabet equalization matrix is defined as follows:
VH=diag(β′)XH Equation 7
Here, βϵU is a vector that consists of post-equalization scaling factors and XHϵU×B is an equalization matrix with entries chosen from the finite alphabet .
Embodiments of the finite-alphabet equalizer 22 use finite alphabets of low cardinality and whose elements can be represented using a small number of bits (e.g., 8 bits or less). An example is the 1-bit finite alphabet ={+1+j, +1−j, −1+j, −1−j}, which uses 1-bit per real and imaginary component.
With Equation 7, the equalized received symbol for the uth UE 28 is given by:
ŝu=vuHy=β*uxuHy Equation 8
where vuHϵ1×B and xuHϵ1×B are the uth rows of the matrices VH and XH, respectively. The spatial equalization as in Equation 8 allows for efficient circuit implementations, especially for finite alphabets with low cardinality and regularly spaced elements. For such matrices, the inner product xuHy can be implemented using low-resolution multipliers. As βϵU, the post-equalization scaling operation by the scalar factor βu is performed using high-resolution multipliers. Nonetheless, this operation is executed only once per UE 28. Section I-D below illustrates that equalizer implementations which leverage finite-alphabet equalization matrices enable significant area and power savings.
From the above equations, FAME is developed as a principled method to compute MSE-optimal finite-alphabet equalization matrices. Analogous to the derivation of the L-MMSE equalizer in Equation 6, FAME is interested in minimizing the post-equalization MSE:
FA−MSE=s,n[∥VHy−s∥22] Equation 9
with the difference that VH=diag(β*)XH is now a finite-alphabet equalization matrix as per Equation 7. From Equation 6, it follows that the rows vuH=β*uxuH, u=1, . . . , U, of such a FAME matrix can be computed by solving the following optimization problem:
{βu,xu}=arg ∥eu−HH{tilde over (β)}{tilde over (x)}∥22+ρ∥{tilde over (β)}{tilde over (x)}∥22 Equation 10
Intuitively, embodiments seek to find the finite-alphabet equalization vectors vuH=β*uxuH, u=1, . . . , U, that best mimic the infinite-precision L-MMSE equalizer.
For a fixed scaling factor βu, the FAME problem in Equation 10 is a closest vector problem, which is known to be NP-hard. For example, for a system with an antenna array 26 of B=256 antennas using a 1-bit finite-alphabet equalization matrix, solving the FAME problem using an exhaustive search would require one to evaluate the objective function in Equation 10 more than 10154 times for each UE 28. Clearly, without low-complexity algorithms, the FAME problem cannot be solved in practical massive MU-MIMO mmWave systems.
Since the FAME problem in Equation 10 minimizes the cost function for two quantities at once, i.e., the scaling factor βu and the low-resolution vector xu, it is not obvious how to solve it efficiently. To derive computationally efficient algorithms in subsection C below, the following equivalent form of the FAME problem is used. The FAME problem in Equation 10 is equivalent to solving the following optimization problem for each UE u=1, . . . , U:
where the associated optimal scaling factor is given by:
This formulation of the FAME problem facilitates first finding the optimal vector xu using Equation 11 and then computing the associated optimal scaling factor βu using Equation 12. Note that Equation 12 models the MSE optimal scaling factor βu for a given vector xu, regardless of how xu was computed.
FL-MMSE: A Baseline Finite-Alphabet Equalizer: Since the FAME problem is NP-hard, a baseline method is presented to compute finite-alphabet equalization matrices as in Equation 7 without having to solve the FAME problem in Equation 11. This approach is referred to as FL-MMSE, as it obtains the entries of the low-resolution matrix XH by quantizing the L-MMSE equalizer in Equation 5. The corresponding scaling factors βu are then obtained using Equation 12. Throughout this disclosure, the FL-MMSE equalizer is used as a baseline method to evaluate the performance of embodiments of the finite-alphabet equalizer 22 that attempt to directly solve the FAME problem in Equation 11.
For the 1-bit case, FL-MMSE applies the signum function sgn(⋅) separately on the real and imaginary parts of the L-MMSE matrix WH to obtain XH. Then, FL-MMSE uses the optimal FAME scaling in Equation 12 to compute the high-resolution scaling factors in the vector β. FL-MMSE can also be used with finite-alphabets that have more than 1-bit per complex entry. In such cases, after computing the L-MMSE equalization matrix WH in Equation 5, the real and imaginary parts are quantized as follows.
For each row wuH of WH, the scalar wmax corresponding to the largest absolute value in [{wuH}, ℑ{wuH}] is identified. Then, assuming that the targeted resolution is r bits, the range [−wmax, +wmax] is divided into 2r uniform-width bins and the entries of {wuH} and ℑ{wuH} are quantized to the centroid values of these bins. For 2-bit resolution, for example, the centroid values of the bins are {−0.75, −0.25, +0.25, +0.75}wmax. In hardware, one would scale these centroid values so that the minimum absolute value corresponds to 1. Following the previous example, one would use the values {−3, −1, +1, +3} to represent the entries of {xuH} and ℑ{xuH}. Note that this scaling does not affect the solution of the FAME problem in Equation 11, as it can be absorbed into the scaling factor vector βu in Equation 12. After obtaining the low resolution vector, the corresponding scaling factor βu is computed using Equation 12.
While the infinite-precision L-MMSE equalizer achieves an EVM of 11.58%, quantizing its solution to 1-bit using FL-MMSE degrades the EVM to 30.58%, which blurs the decision regions of the considered 16-QAM constellation. In stark contrast, the 1-bit FAME-EXH equalizer achieves an EVM of only 15.30%, which is close to that of the infinite-precision L-MMSE equalizer; furthermore, the decision regions between constellation points are clearly visible. These results demonstrate the significant EVM advantage of solving the FAME problem over the simple FL-MMSE equalizer.
hb(ϕ)=e−jπ(b−1)cos(ϕ),b=1, . . . ,B Equation 13
Here, a uniform linear array (ULA) of antennas with half-wavelength antenna spacing and constant path loss is assumed. A primary UE 28 is located at an angle of ϕ1=60° and a secondary UE 28 is located at ϕ2=120°. Next, the corresponding equalization matrix is computed using L-MMSE, FL-MMSE, and FAME-EXH equalization. An evaluation is performed of how much the equalization vector v1H (which corresponds to the UE at ϕ1=60°) captures (or rejects) signals incoming from different incident angles by evaluating |v1Hh(ϕ′)|2 for 0≤ϕ′≤π. The equalization vector v1H should amplify the signal from the primary UE 28 at ϕ1=60° but attenuate the signal from the secondary UE 28.
The results shown in
Despite the significant performance advantages of 1-bit FAME-EXH over 1-bit FL-MMSE, solving 1-bit FAME-EXH for large-dimensional problems that arise in mmWave systems is infeasible in practice. To this end, low-complexity FAME solvers are developed that scale to large antenna arrays 26.
Approximate algorithms are presented to solve the FAME problem efficiently for a radio access node 20 with a large antenna array 26. This begins with proposing a semidefinite relaxation (SDR)-based method and then developing a much faster method that uses forward-backward splitting (FBS).
FAME with SDR: First, SDR is used to solve the FAME problem in Equation 11 for a 1-bit finite alphabet. To do so, the FAME problem is re-expressed in the real domain using the quantities:
Throughout, it is assumed that {x} and ℑ{x} take values from the same alphabet . For example, for 1-bit finite alphabets, ={−1, +1}. The FAME problem in Equation 11 can now be rewritten as:
It is now key to realize that the vector can be scaled arbitrarily without changing the objective function of Equation 15. This observation enables us to state an equivalent optimization problem:
where the discrete set α is a scaled version of ; for 1-bit finite alphabets, α={−α, +α} with α>0. This formulation enables formulation of a semidefinite program to solve the FAME problem approximately.
By focusing on 1-bit finite alphabets, Equation 16 can be relaxed by replacing the constraint |2=1 by =1, where the positive semidefinite matrix
where the diagonal elements of
While FAME-SDR can also be derived for multi-bit finite alphabets, this approach is not pursued for the following reasons. As described further below, the complexity of FAME-SDR does not scale well to a large number of BS antennas. Moreover, FAME-SDR cannot be applied to finite alphabets that are not separable into real and imaginary parts, such as a finite alphabet that contains the elements of an 8-phase shift keying (PSK) constellation. In addition, SDR can only handle finite alphabets with even cardinality that exclude a zero element. To avoid the drawbacks of SDR for FAME, an alternative approach is presented.
FAME with FBS: Due to the high complexity of FAME-SDR and the fact that SDR solvers are notoriously difficult to implement in hardware, a low-complexity alternative for solving the FAME problem approximately is presented next. To do so, it is assumed that, for each UE u=1, . . . , U, the optimal value of the objective in Equation 11 is known and denoted by γu. Mathematically:
where xu is the solution to the problem in Equation 11. Note that it follows from Equation 18 that γu>1. Rearranging 18 yields:
0=∥HHxu∥22+ρ∥xu∥22−γu|huHxu|2 Equation 19
Thus, if γu was known, solving the problem:
would yield the same solution as Equation 11. As the value of γu is unknown in practice, it is used as an algorithm parameter that is tuned to empirically improve the error-rate performance.
Since the problem in Equation 20 still contains a search over the finite-alphabet B, the non-convex constraint {tilde over (x)}ϵB is relaxed to {tilde over (x)}ϵB. Here, corresponds to the convex hull of the finite alphabet , which is defined as:
where
is added to the objective function, where δ>0 is a regularization parameter. The resulting optimization problem is given by:
FBS is used to compute a solution to Equation 22. FBS is an efficient, iterative solver for convex optimization problems of the form:
{circumflex over (x)}=arg min
where both functions ƒ and g are convex, but ƒ is a smooth function and g is not necessarily smooth or bounded. FBS executes the following operations for t=1, 2, . . . , tmax iterations or until convergence:
{tilde over (z)}(t+1)={tilde over (x)}(t)−Σ(t)∀ƒ({tilde over (x)}(t)) Equation 24
{tilde over (x)}(t+1)=proxg({tilde over (z)}(t+1);τ(t)) Equation 25
Here, ∀ƒ({tilde over (x)}(t)) is the gradient of the function ƒ, {τ(t)>0} is a sequence of step sizes, and proxg(⋅) is the proximal operator of the function g, defined as:
The problem in Equation 22 is not convex and hence, FBS is not guaranteed to converge to an optimal solution. Nevertheless, FBS can be used to find approximate solutions to Equation 22 by setting:
Here, the convex constraint {tilde over (x)}ϵB in Equation 22 is incorporated into the function g({tilde over (x)}) via the indicator function ({tilde over (x)}), which is zero if {tilde over (x)}ϵB and infinity otherwise. With these definitions:
∀ƒ({tilde over (x)})=HHH{tilde over (x)}−γuhuhuH{tilde over (x)} Equation 29
proxg({tilde over (z)})=sgn({{tilde over (z)}})min{v(t)|{{tilde over (z)}}|,1}+j sgn(ℑ{{tilde over (z)}})min{ν(t)|ℑ{{tilde over (z)}}|,1} Equation 30
where ν(t)=(1+τ(t)(ρ−δ))−1 and Equation 30 is applied element-wise to the vector {tilde over (z)}.
Note that three sets of algorithm parameters have been introduced: {τ(t)}, {ν(t)}, and {γu}, where t=1, . . . , tmax and u=1, . . . , U. In some examples, these parameters may be tuned manually. To avoid manual tuning of these parameters, other embodiments apply a neural-network-based approach. As the same algorithm parameters should work across several channel realizations, having a per-UE parameter such as {γu} is meaningless. As a result, embodiments set γ=γu for u=1, . . . , U. Furthermore, to provide the neural network with greater flexibility during optimization, γ is allowed to be different in each iteration; i.e., another set of per-iteration parameters {γ(t)}, t=1, . . . , tdmax are introduced. The resulting algorithm is referred to as FAME-FBS, which is summarized as follows:
FAME-FBS supports multi-bit finite alphabet equalization matrices. This is achieved by uniformly quantizing, in the range [−1, +1], the real and imaginary parts of the solution vector {tilde over (x)}(t
Computational Complexity: The complexity is assessed of (i) computing the equalization matrix and (ii) performing equalization on a received vector y, for high-resolution and finite-alphabet equalization approaches. Computational complexity is measured as the number of real-valued multiplications performed by an algorithm.
Table I lists the computational complexity for computing a single equalization matrix using L-MMSE, FL-MMSE, FAME-SDR, and FAME-FBS. For the infinite-precision L-MMSE equalizer, the complexity corresponds to explicitly computing the equalization matrix WH. For the finite-alphabet equalizers (FL-MMSE and FAME-based algorithms), the complexity corresponds to the computation of the low-resolution matrix XH and the scaling factors in the vector p. Solving FAME-SDR results in the highest complexity, which asymptotically scales as O(B4.5) unless specific problem structures can be exploited. Evidently, FAME-SDR does not scale well to systems with a large antenna array 26. FAME-FBS has the same asymptotic scaling of O(BU2) as L-MMSE and FL-MMSE equalization, making it suitable for massive MU-MIMO mmWave systems.
While the constant associated with the term BU2 is larger for FAME-FBS than for L-MMSE and FL-MMSE, the complexity of the latter algorithms appears to be higher in practice. Computing the L-MMSE (and the FL-MMSE) equalizer in hardware requires square roots and divisions, which result in high numerical precision requirements. Furthermore, the Cholesky decomposition and forward- and back-substitution procedures required when computing the L-MMSE (and the FL-MMSE) equalization matrix result in stringent data dependencies that limit parallelism and, hence, reduce throughput. In contrast, FAME-FBS has a regular structure with few data dependencies and the matrix-vector multiplications can be parallelized easily. In addition, one can parallelize computation per UE as the FAME problem in Equation 11 is independent for u=1, . . . , U. In fact, a simple hardware engine could be used to efficiently execute FAME-FBS to determine the low-resolution equalization vectors xuH.
After computing the equalization matrix, one must perform spatial equalization on the received signal vectors y at the rate of the ADCs. For the infinite-precision L-MMSE equalizer, this corresponds to computing one high-resolution matrix-vector product {circumflex over (x)}=WHy per receive vector. For finite-alphabet equalizers, this corresponds to a low-resolution matrix-vector product z=XHy, followed by U high-resolution products ŝu=β*uzu, u=1, . . . , U. The complexity of equalization is summarized in Table II, which distinguishes between high resolution and low resolution multiplications. While finite-alphabet equalization performs more multiplications than a conventional equalizer, most of these multiplications are performed at low resolution. Thus, for sufficiently low resolution, finite-alphabet equalization effectively reduces the complexity of spatial equalization.
While spatial equalization must be carried out at symbol rate, the computation of the equalization matrix must only be carried out if the channel matrix changes. Due to operation at extremely high bandwidths, the complexity of performing equalization will dominate in most mmWave systems. For scenarios with short coherence times, methods that minimize the complexity of computing the equalization matrix are to be preferred.
With reference to
Power control is simulated for
Hardware-Level Evaluation: To demonstrate the real-world benefits of finite-alphabet equalization, the power and area savings that can be attained in comparison with conventional, high-resolution equalizers are quantified. To arrive at a fair comparison between finite-alphabet equalization and conventional, high-resolution equalizers, two equalization circuits were implemented: one for finite-alphabet equalization and one for high-resolution equalization.
The high-resolution equalizer computes a matrix-vector product between the U×B equalization matrix WH and the received vector y. The matrix-vector product is computed in a column-by-column fashion by using a linear array of U parallel multiply-accumulate (MAC) units over B clock cycles. The multipliers in the MAC units are high-resolution and take as input 10-bit numbers from the equalization matrix WH and 7-bit numbers from the received vector y. The accumulators in the MAC units use 18 bits. Finally, 9 bits are taken from both real and imaginary accumulators as the outputs of each MAC unit. These outputs correspond to the estimates ŝ=WHy.
The finite-alphabet equalizer computes a low-resolution matrix-vector product between the U×B finite-alphabet matrix XgH and the received vector y. This matrix-vector product is implemented in the same way as in the traditional equalizer, with the difference that far fewer bits are used for the multipliers and accumulators. The multipliers take as input r-bit numbers from XH and 7-bit numbers from y, while the accumulators use r+13 bits (except for the case where r=1, where the accumulators use 13 bits). 9 bits are taken from the accumulators in each MAC unit as the output of the low-resolution matrix-vector product XHy. Unlike conventional equalization, the results of the U-dimensional vector XHy are scaled by the values in β. This scaling operation is implemented with a high-resolution multiplier that computes the product between the 9-bit xuHy and the 10-bit scaling factor βu. The output of this multiplier are the estimates ŝ=VHy, which are represented with 9 bits per real and imaginary components.
Table III lists post-layout implementation results for the circuits discussed above implemented for a B=256 BS antenna, U=16 UE system, using a 28 nm CMOS technology. The traditional, high-resolution equalizer corresponds to the design with an equalization resolution r of 10 bits, whereas the finite-alphabet equalizer was implemented for r={1, 2, . . . , 5} bits. To allow for a fair comparison between the different equalization circuits, we consider a scenario in which all of the designs support the same throughput. A throughput of 2 G (complex-valued) vectors/s is assumed, which implies that the 2B ADCs at the BS run at 2 G samples/s.
As seen from Table III, a single instance of the equalizer design reaches throughputs of the order of M vectors/s, which is well below the target throughput of 2 G vectors/s. However, a time-multiplexed array of equalizers can be instantiated that achieve the desired throughput (at the expense of increased area).
In the example of
Note that the power and area can be reduced much more. Once the number of bits in the equalization matrix has been reduced to 5 bits or below, emerging processing-in-memory architectures lower the area and power (additionally to the savings above) by about 2× to 4×.
In some embodiments, the finite-alphabet equalization approach described above is extended by unbiased estimation and soft-output computation. A compact expression of the post-equalization MSE is derived, which can be used to efficiently compute log-likelihood ratio (LLR) values. The effectiveness of this extension is demonstrated by error-rate simulation results for a coded massive MU-MIMO-OFDM system, for two unbiased soft-output finite-alphabet equalizers, both in LoS and non-LoS mmWave channel scenarios.
This subsection focuses on a massive MU-MIMO embodiment of the wireless communications system 18 of
MSE=s,n[∥
Under the statistical assumptions on s and n of Section I-A above, the L-MMSE equalization matrix is given by:
WH=(pIU+HHH)−1HH Equation 34
where ρ=N0/ES. The rows wuH, u=1, . . . , U, of the L-MMSE equalizer W
wu=arg ∥eu−HH{tilde over (w)}∥22+ρ∥{tilde over (w)}∥22 Equation 35
Spatial equalization with a biased L-MMSE estimate for each user u=1, . . . , U amounts to computing:
where hu is the uth column of H and
is the noise-plus-interference (NPI) vector. In general, the L-MMSE equalizer has rows for which wuHhu≠1. Thus, to perform unbiased estimation, the goal is to compute the estimates for each UE u=1, . . . , U as follows:
In general, the biased
While the discussion of subsections A-C above focus on hard-output data detection, coded communication systems benefit from spatial equalizers that compute soft-outputs. To fully exploit forward error correction, the post-equalization NPI variance is extracted and then used to generate LLR values. For the uth UE, the NPI variance is given by the MSE of the unbiased estimate ŝu, which is computed as follows:
Here, (a) follows from Equation 37 and (b) from Equation 12. Note that this result applies to any finite-alphabet equalizer as in Equation 7, as long as βu (xu) is computed as in Equation 12.
With this, soft outputs can be computed in the form of LLR values, by assuming that the residual error ŝu−su is circularly-symmetric Gaussian with variance vu2. Concretely, the LLR values are computed as follows:
Here, q(1) and q(0) are the subsets of the constellation in which the qth bit is 1 and 0, respectively. Note that computing soft outputs for finite-alphabet equalizers entails the same complexity as for infinite-precision L-MMSE.
The FL-MMSE and FAME-FBS algorithms described above can be used to obtain the rows xuH of XH. For both algorithms, once xuH is known, the associated βu(xu) is computed using Equation 12; this factor is required to compute the variance νu2 using Equation 42, which is then used to compute LLR values with Equation 43.
Similar to the case of equalization in the uplink, the power consumption and silicon area of precoding in the all-digital mmWave MU-MIMO downlink (e.g., from the core network 30 to the UEs 28 of
A model is presented which focuses on the downlink of a mmWave massive MU-MIMO embodiment of the wireless communications system 18 of
x[∥x∥22]≤P Equation 44
The goal of precoding is to simultaneously transmit constellation points suϵ to the u=1, . . . , U UEs while reducing MU interference. Here, su is assumed to have zero mean and variance Es, and denotes the constellation set (e.g., 16-QAM). The radio access node 20 maps the vector s into the precoded vector x with the aid of channel state information (e.g., received from the channel estimator circuitry 34). The precoded vector x is crafted such that the UEs 28 can form an estimate ŝuϵ of the transmitted symbol su simply by scaling the received signal yu. Specifically, it is assumed that each UE forms an estimate as ŝu=βyu. Here, βϵ+ is a precoding factor that can be estimated at the UE using pilot-based transmission.
This section focuses on linear precoders for which it holds that x=Ps, where PϵB×U is the precoding matrix. Thus, some embodiments use linear precoders that attempt to minimize the MSE between the estimated symbols s and the transmitted symbols s:
MSE=s,n[∥s−ŝ∥22]=s,n[∥s−βHx∥22]+β2UN0 Equation 45
Minimizing Equation 45 subject to the power constraint in Equation 44 results in the so-called WF precoder, where the precoding matrix is given by
with:
It is important to realize that the matrix QWFϵB×U the solution of the following optimization problem:
QWF=arg IU−H{tilde over (Q)}|F2+κWF|{tilde over (Q)}|F2 Equation 48
The columns quWFϵB, u=1, . . . , U, of the matrix solving:
quWF=arg ∥eu−H{tilde over (q)}∥22+κWF∥{tilde over (q)}∥22 Equation 49
Applying the Woodbury identity to Equation 46 yields:
QWF=HH(HHH+κWFIU)−1 Equation 50
which is the solution to the following optimization problem:
QWF=arg ∥IB−{tilde over (Q)}H∥F2+κWF∥{tilde over (Q)}∥F2 Equation 51
Thus, the rows qbr,WF, b=1, . . . , B, of QWF (where the superscript r denotes a row vector) can be computed as:
qbr,WF=arg ∥ebH−{tilde over (q)}rH∥22+κWF∥{tilde over (q)}r∥22 Equation 52
The alternative optimization problems in Equation 49 and Equation 52 to compute the matrix QWF will become useful in the next subsection.
WF precoding computes
for each transmitted vector s. Unfortunately, digital precoding circuitry will be power hungry and large as mmWave MU-MIMO systems operate with high-dimensional data and extremely high sampling rates. As a remedy, FAWP proposes to represent the matrix QWF using coarsely quantized numbers, with the objective of reducing the hardware complexity of the matrix-vector product QWFs. Unfortunately, a direct quantization of the matrix QWF typically leads to a significant error-rate degradation.
In order to design low-resolution matrices that are WF-optimal, i.e., that best mimic the infinite-precision WF-precoding matrix QWF, finite-alphabet matrices are used, similar to those described above for spatial equalization in the mmWave MU-MIMO uplink. Since finite-alphabet matrices are applied to imitate the WF-precoding matrix QWF, they are referred to herein as FAWP matrices. FAWP matrices introduce a few high-resolution scaling factors that help to bring a low-resolution matrix to the right scale. Two distinct FAWP matrix structures, namely pre-FAWP and post-FAWP matrices, are considered here.
Pre-FAWP Matrix: A pre-FAWP matrix is defined as a B×U matrix with the structure:
Q=A diag(α*) Equation 53
where AϵB×U is a low-resolution matrix with entries taken from the finite alphabet and αϵU is a vector with per-UE scaling factors.
By using a pre-FAWP matrix, the matrix-vector product Qs becomes A(diag(α*)s). Such a matrix is called pre-FAWP as the U entries of the transmitted symbol vector s are scaled by the entries of α* before getting multiplied with the matrix A. Pre-FAWP reduces hardware complexity of Qs since the matrix A has low-resolution entries. Consider, for example, the case in which the entries of A are chosen from the 1-bit alphabet ={±1±j}; multiplying this matrix A with the vector diag(α*)s does not require hardware multipliers, but only adders and subtractors.
To calculate pre-FAWP matrices that are WF-optimal, the problem in Equation 49 is solved by assuming that Q has the form given by Equation 53. By doing so, the following procedure is arrived at:
The problem in Equation 48 is equivalent to solving the following optimization problem for each UE u=1, . . . , U:
Here, au is the uth column of A, hur is the uth row of H, and the associated optimal scaling factor is given by:
Equations 54 can be established by first plugging Equation 53 into Equation 49. Then, Equation 55 is obtained by taking the Wirtinger derivative with respect to αn.
Post-FAWP Matrix: A post-FAWP matrix is defined as a B×U matrix with the structure:
Q=diag(ζ)ZH Equation 56
where ZϵU×B is a low-resolution matrix with entries taken from the finite alphabet and ζϵB is a vector with per-BS-antenna scaling factors.
By using a post-FAWP matrix, the matrix-vector product Qs becomes diag(ζ)(ZHs). Such a matrix is called post-FAWP as the B scaling factors in ζ are applied after multiplying the matrix ZH with the vector s. Post-FAWP reduces the hardware complexity of Qs since the B×U matrix-vector product ZHs can be implemented using exclusively low-resolution arithmetic units. The results of ZHs are then entry-wise scaled by ζ, which requires only B high-resolution scalar multiplications.
Akin to the case of pre-FAWP matrices, post-FAWP matrices are obtained that are WF-optimal by solving the problem in Equation 52 with a matrix Q that has the form given in Equation 56. By doing so, the following procedure is arrived at:
The problem in Equation 51 is equivalent to solving the following optimization problem for each BS antenna b=1, . . . , B:
Here, zb is the bth column of Z, hb is the bth column of H, and the associated optimal scaling factor is given by:
In summary, both pre-FAWP and post-FAWP matrices are composed by a low-resolution matrix and a set of scaling factors. The difference is that a pre-FAWP matrix applies its U scaling factors before the multiplication with the low-resolution matrix, whereas a post-FAWP matrix applies its B scaling factors after matrix multiplication. As BU in typical massive MU-MIMO systems, a pre-FAWP matrix performs fewer high-resolution scaling operations than a post-FAWP matrix. However, the matrix-vector product is simpler with a post-FAWP matrix than with a pre-FAWP matrix, since the vector has a lower resolution as the symbols in s are not scaled yet. Thus, neither pre-FAWP nor post-FAWP matrices have a clear advantage over the other in terms of hardware complexity. Nonetheless, both FAWP matrix structures are expected to reduce hardware complexity when compared to traditional precoding, as the low-resolution matrices in both structures have coarsely quantized entries.
Computing FAWP Matrices: Different methods are proposed to compute pre-FAWP and post-FAWP matrices defined in Equation 53 and Equation 56, respectively. Means to estimate the precoding factor β are also discussed.
For pre-FAWP and post-FAWP matrices, the scaling factors are computed by means of Equation 55 and Equation 58, respectively, regardless of how the low-resolution matrix (A for pre-FAWP and Z for post-FAWP) is computed. Instead of solving the problems in Equation 54 or Equation 57, a simple approach is to directly quantize the infinite-precision matrix QWF. This approach is called FAWP-WF; more specifically, pre-FAWP-WF and post-FAW P-WF when applied to pre-FAWP and post-FAWP matrices, respectively.
QWF is quantized following the method put forward in Section I. For pre-FAWP-WF, the maximum value wmax of [|{quWF}|; |ℑ{quWF}|] is found for each column quWF of QWF. The range [−wmax, wmax] is then divided into uniform-width bins, where each bin is represented by its centroid value. The centroid values are scaled by the same factor so that they are integer numbers, which preserves the objective value in Equation 54 and results in the low-resolution entries of the column au. For post-FAWP-WF, the same procedure is applied on a per-row basis: each quantized row of QWF corresponds to one row of ZH.
Since the problems in Equation 54 and Equation 57 are NP-hard, FAWP-WF significantly reduces complexity. Concretely, FAWP-WF requires the same complexity of (BU2) as computing the infinite-precision QWF in Equation 50. As a result, we will use FAWP-WF as a baseline to evaluate the performance of the algorithm proposed next, which tackles the problems in Equation 54 and Equation 57.
FAWP via Forward-Backward Splitting (FBS): Similar to finite-alphabet equalization matrices in Section I, the FAWP problems in Equation 54 and Equation 57 can also be approximately solved using FBS, an approach dubbed FAWP-FBS. Pre-FAWP-FBS is presented as an algorithm for computing the low-resolution part of a pre-FAWP matrix starting from the problem in Equation 54. The algorithm for post-FAWP matrices, dubbed post-FAWP-FBS, can be derived in a similar way starting from Equation 57.
As in Section I, it is assumed that the optimal objective value γu of Equation 54, u=1, . . . , U, is known. Then, solving the problem in Equation 54 is equivalent to solving the following problem:
As γu is unknown, we will use it as a parameter that can be tuned to empirically improve the performance of our algorithm.
Next, the finite-alphabet constraint ãϵB in Equation 59 is relaxed to ãϵB, where represents the convex hull of . By doing so, the all-zeros vector 0B×1 becomes a trivial solution. To avoid this solution, the term
with δ>0, is included in Equation 59 to encourage large entries in the vector ã. The resulting optimization problem is:
Now FBS can be applied. FBS is an efficient procedure for solving convex problems of the form â=arg minãƒ(ã)+g(ã), where both functions ƒ and g are convex, but ƒ is smooth and g is not necessarily smooth or bounded. FBS is an iterative method that runs for tmax iterations or until convergence. In each iteration t, FBS computes:
{tilde over (v)}(t+1)=ã(t)−τ(t)∀ƒ(ã(t)) Equation 61
ã(t+1)=proxg({tilde over (v)}(t+1);τ(t)) Equation 62
where ∀ƒ(ã(t)) is the gradient of the function ƒ and {τ(t)>0} is a sequence of step sizes. The proximal operator of the function g is defined as
Since the problem in Equation 60 is non-convex, FBS is not guaranteed to converge to an optimal solution. Nevertheless, FBS is used to approximately solve Equation 60 by setting:
where B(ã) is the indicator function, which is zero if ãϵB and infinity otherwise. The indicator function is used to incorporate the convex constraint ãϵB in Equation 60 into the function g(ã). These choices for ƒ(ã) and g(ã) result in:
∀ƒ(ã)=HHHã−γu(hur)Hhurã Equation 65
prox9({tilde over (ν)})=sgn({{tilde over (ν)}})min{ν(t)|{{tilde over (ν)}}|,1}+j sgn(ℑ{{tilde over (ν)}})min{ν(t)|ℑ{{tilde over (ν)}}|,1} Equation 66
where ν(t)=(1+τ(t)(κWF−δ))−1 and Equation 66 is applied element-wise to {tilde over (ν)}. Pre-FAWP-FBS can be summarized as follows:
To tune the algorithm parameters {τ(t)}, {ν(t)}, and {γ(t)}, some embodiments use a neural-network-based approach. Note that γu is replaced with γ(t) in Algorithm 2 in order to (i) keep the algorithm general for different user locations and (ii) to increase flexibility during optimization.
Post-FAWP-FBS is now summarized, which can be derived following similar steps as for the derivation of pre-FAWP-FBS.
We note that both FAWP-FBS algorithms have the same complexity order of (BU2) as WF and FAWP-WF.
While the BS is able to compute the precoding factor β via Equation 47 with a FAWP matrix Q instead of QWF, the UEs need to estimate such precoding factor in order to correctly estimate the transmitted symbols in s. Estimation can be achieved in a block-fading scenario by transmitting a pilot symbol that is known at the UE side. Specifically, the BS will transmit the pilot su=√{square root over (ES)}, u=1, . . . , U. Then, the uth UE will receive yu=β−1hurqusu+ěu+nu, where ěu represents residual interference from the other UEs. The objective now is for the UE to find a {circumflex over (β)}uϵ+ such that it generates an unbiased estimate ŝu of su, i.e., ŝu={circumflex over (β)}uyu≈su.
By taking into account that the transmitted pilot symbol su is known to be √{square root over (Es)} and by assuming that ěu+nu is zero-mean Gaussian distributed and independent of su, the UE can compute a maximum likelihood estimate (MLE) of {circumflex over (β)}u as:
{circumflex over (β)}uMLE={√{square root over (Es)}/yu} Equation 71
While more pilots could be transmitted to form a better estimate {circumflex over (β)}uMLE, the results in Subsection C below show that one pilot is sufficient to achieve reliable downlink communication.
Simulation results for both pre-FAWP and post-FAWP matrices are generated by either FAWP-WF or FAWP-FBS. A comparison is provided in terms of BER and EVM versus normalized transmit power, defined as P/N0. For simplicity, the evaluation is restricted to a mmWave system with B=256 BS antennas serving U=16 UEs in a 16-QAM system operating over an i.i.d. Rayleigh-fading channel.
With reference to
The exemplary computer system 1300 in this embodiment includes a processing device 1302 or processor, a system memory 1304, and a system bus 1306. The system memory 1304 may include non-volatile memory 1308 and volatile memory 1310. The non-volatile memory 1308 may include read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and the like. The volatile memory 1310 generally includes random-access memory (RAM) (e.g., dynamic random access memory (DRAM), such as synchronous DRAM (SDRAM)). A basic input/output system (BIOS) 1312 may be stored in the non-volatile memory 1308 and can include the basic routines that help to transfer information between elements within the computer system 1300.
The system bus 1306 provides an interface for system components including, but not limited to, the system memory 1304 and the processing device 1302. The system bus 1306 may be any of several types of bus structures that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and/or a local bus using any of a variety of commercially available bus architectures.
The processing device 1302 represents one or more commercially available or proprietary general-purpose processing devices, such as a microprocessor, central processing unit (CPU), or the like. More particularly, the processing device 1302 may be a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a processor implementing other instruction sets, or other processors implementing a combination of instruction sets. The processing device 1302 is configured to execute processing logic instructions for performing the operations and steps discussed herein.
In this regard, the various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with the processing device 1302, which may be a microprocessor, field programmable gate array (FPGA), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), or other programmable logic device, a discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. Furthermore, the processing device 1302 may be a microprocessor, or may be any conventional processor, controller, microcontroller, or state machine. The processing device 1302 may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration).
The computer system 1300 may further include or be coupled to a non-transitory computer-readable storage medium, such as a storage device 1314, which may represent an internal or external hard disk drive (HDD), flash memory, or the like. The storage device 1314 and other drives associated with computer-readable media and computer-usable media may provide non-volatile storage of data, data structures, computer-executable instructions, and the like. Although the description of computer-readable media above refers to an HDD, it should be appreciated that other types of media that are readable by a computer, such as optical disks, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used in the operating environment, and, further, that any such media may contain computer-executable instructions for performing novel methods of the disclosed embodiments.
An operating system 1316 and any number of program modules 1318 or other applications can be stored in the volatile memory 1310, wherein the program modules 1318 represent a wide array of computer-executable instructions corresponding to programs, applications, functions, and the like that may implement the functionality described herein in whole or in part, such as through instructions 1320 on the processing device 1302. The program modules 1318 may also reside on the storage mechanism provided by the storage device 1314. As such, all or a portion of the functionality described herein may be implemented as a computer program product stored on a transitory or non-transitory computer-usable or computer-readable storage medium, such as the storage device 1314, volatile memory 1308, non-volatile memory 1310, instructions 1320, and the like. The computer program product includes complex programming instructions, such as complex computer-readable program code, to cause the processing device 1302 to carry out the steps necessary to implement the functions described herein.
An operator, such as the user, may also be able to enter one or more configuration commands to the computer system 1300 through a keyboard, a pointing device such as a mouse, or a touch-sensitive surface, such as the display device, via an input device interface 1322 or remotely through a web interface, terminal program, or the like via a communication interface 1324. The communication interface 1324 may be wired or wireless and facilitate communications with any number of devices via a wireless communications system 18 in a direct or indirect fashion. An output device, such as a display device, can be coupled to the system bus 1306 and driven by a video port 1326. Additional inputs and outputs to the computer system 1300 may be provided through the system bus 1306 as appropriate to implement embodiments described herein.
The operational steps described in any of the exemplary embodiments herein are described to provide examples and discussion. The operations described may be performed in numerous different sequences other than the illustrated sequences. Furthermore, operations described in a single operational step may actually be performed in a number of different steps. Additionally, one or more operational steps discussed in the exemplary embodiments may be combined.
Those skilled in the art will recognize improvements and modifications to the preferred embodiments of the present disclosure. All such improvements and modifications are considered within the scope of the concepts disclosed herein and the claims that follow.
This invention was made with government funds under Agreement No. HR0011-18-3-0004 awarded by The Defense Advanced Research Projects Agency (DARPA). The U.S. Government has certain rights in this invention.
Number | Name | Date | Kind |
---|---|---|---|
10812125 | Badic | Oct 2020 | B1 |
20050024540 | Kim | Feb 2005 | A1 |
20060269023 | Chimitt | Nov 2006 | A1 |
20080235311 | Budianu | Sep 2008 | A1 |
20130114001 | Kweon | May 2013 | A1 |
Entry |
---|
Terry et al. “Convergence Analysis of Finite Alphabet Beamformers for Digital Cochannel Signals”, IEEE Transactions on Communications, vol. 51, No. 6, Jun. 2003 (Year: 2003). |
Wang et al. “Hybrid Precoderand Combiner Design With Low-Resolution Phase Shifters in mmWave MIMO Systems”, IEEE Journal of Selected Topics in Signal Processing, vol. 12, No. 2, May 2018 (Year: 2018). |
Ando, K. et al., “BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W,” IEEE Journal of Solid-State Circuits (JSSC), vol. 53, No. 4, Apr. 2018, IEEE, pp. 983-994. |
Andoni, A. et al., “Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimension,” Communications of the ACM, vol. 51, No. 1, Jan. 2008, pp. 117-122. |
Annovi, A. et al., “Characterization of an Associative Memory Chip in 28 nm CMOS Technology,” 2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 27-30, 2018, Florence, Italy, 5 pages. |
Aga, S. et al., “Compute Caches,” 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb. 4-8, 2017, Austin, TX, USA, IEEE, 12 pages. |
Arikan, E., “Channel Polarization: A Method for Constructing Capacity-Achieving Codes,” IEEE International Symposium on Information Theory (ISIT), Jul. 6-11, 2008, Toronto, Canada, IEEE, pp. 1173-1177. |
Bankman, D. et al., “An always-on 3.8 J/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS,” IEEE Journal of Solid-State Circuits, Feb. 11-15, 2018, San Francisco, CA, USA, pp. 222-224. |
Conti, F. et al., “XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, Issue 11, Nov. 2018, IEEE, 11 pages. |
Daemen, J. et al., “The Design of Rijndael: AES—The Advanced Encryption Standard,” Springer-Verlag Berlin Heidelberg GmbH, 2002, 243 pages. |
Eckert, C. et al., “Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks,” IEEE Micro, vol. 39, Issue 3, May-Jun. 2019, IEEE, 6 pages. |
Goldstein, T. et al., “The Stone Transform: Multi-Resolution ImageEnhancement and Compressive Video,” IEEE Transactions on Image Processing, vol. 24, Issue 12, Dec. 2015, IEEE, 13 pages. |
Hubara, I. et al., “Binarized Neural Networks,” Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS 2016), Dec. 2016, Barcelona, Spain, 9 pages. |
Kim, D. et al., “Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory,” ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Jun. 18-22, 2016, Seoul, South Korea, IEEE Computer Society, pp. 380-392. |
Lee J. et al., “UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision,” IEEE Journal of Solid-State Circuits, vol. 54, Issue 1, Jan. 2019, pp. 173-185. |
Pagiamtzis, K. et al., “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE Journal of Solid-State Circuits, vol. 41, No. 3, Mar. 2006, IEEE, pp. 712-727. |
Pei, T.-B. et al., “VLSI Implementation of Routing Tables: Tries and CAMs,” Conference on Computer Communications, Tenth Annual Joint Conference of the IEEE Computer and Communications Societies Proceedings, Apr. 7-11, 1991, Bal Harbour, FL, USA, IEEE, pp. 0515-0524. |
Zhang, M. et al., “Highly-Associative Caches for Low-Power Processors,” Kool Chips Workshop, 33rd International Symposium on Microarchitecture, Dec. 2000, Monterey, CA, 6 pages. |
Agrell, E., et al., “Closest point search in lattices,” IEEE Transactions on Information Theory, vol. 48, No. 8, Aug. 2002, pp. 2201-2214. |
Alkhateeb, A., et al., “MIMO precoding and combining solutions for millimeter-wave systems,” IEEE Communications Magazine, vol. 52, No. 12, Dec. 2014, pp. 122-131. |
Author Unknown, “5G; NR; Base Station (BS) radio transmission and reception,” 3GPP Technical Specification 38.104, version 15.5.0, release 15, European Telecommunications Standards Institute, May 2019, 219 pages. |
Balatsoukas-Stimming, A., et al., “Neural-network optimized 1-bit precoding for massive MU-MIMO,” 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications, Cannes, France, Jul. 2-5, 2019, 5 pages. |
Beck, A., et al., “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” Society for Industrial and Applied Mathematics Journal on Imaging Sciences, vol. 2, No. 1, Jan. 2009, pp. 183-202. |
Björnson, E., et al., “Massive MIMO in sub-6 GHz and mmWave: Physical, practical, and use-case differences,” IEEE Wireless Communications Magazine, vol. 26, No. 2, Apr. 2019, pp. 100-108. |
Castañeda, O., et al., “1-bit massive MU-MIMO precoding in VLSI,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 7, No. 4, Dec. 2017, pp. 508-522. |
Castañeda, O., et al., “Data detection in large multi-antenna wireless systems via approximate semidefinite relaxation,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 63, No. 12, Dec. 2016, pp. 2334-2346. |
Castañeda, O., et al., “PPAC: A versatile in-memory accelerator for matrix-vector-product-like operations,” 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors, New York, NY, Jul. 15-17, 2019, 8 pages. |
Castañeda, O., et al., “VLSI design of a 3-bit constant-modulus precoder for massive MU-MIMO,” 2018 IEEE International Symposium on Circuits and Systems, Florence, Italy, May 4, 2018, 5 pages. |
Chang, T.-H., et al., “A linear fractional semidefinite relaxation approach to maximum-likelihood detection of higher-order QAM OSTBC in unknown channels,” IEEE Transactions on Signal Processing, vol. 58, No. 4, Apr. 2010, pp. 2315-2326. |
Dutta, S., et al., “A case for digital beamforming at mmWave,” Jan. 24, 2019, https://arxiv.org/pdf/1901.08693.pdf, 30 pages. |
Fatema, N., et al., “Massive MIMO linear precoding: A survey,” IEEE Systems Journal, vol. 12, No. 4, Dec. 2018, pp. 3920-3931. |
Fincke, U., et al., “Improved methods for calculating vectors of short length in a lattic, including a complexity analysis,” Mathematics of Computation, vol. 44, No. 170, Apr. 1985, pp. 463-471. |
Goldstein, T., et al., “A field guide to forward-backward splitting with a FASTA implementation,” https://arxiv.org/pdf/1411.3406v1.pdf, Nov. 12, 2014, 17 pages. |
Goldstein, T., et al., “High-order methods for basis pursuit,” UCLA Computational Applied Mathematics (CAM) Report, Jul. 2010, 17 pages. |
Guo, Q., et al., “AC-DIMM: Associative computing with STT-MRAM,” ACM SIGARCH Computer Architecture News, vol. 41, No. 3, Jun. 2013, pp. 189-200. |
Jacobsson, S. et al., “Nonlinear 1-bit precoding for massive MU-MIMO with higher order modulation,” 2016 50th Asilomar Conference on Signals, Systems and Computers, Nov. 6-9, 2018, Pacific Grove, CA, pp. 763-767. |
Jacobsson, S., et al., “Quantized precoding for massive MU-MIMO,” IEEE Transactions on Communications, vol. 65, No. 11, Nov. 2017, pp. 4670-4684. |
Jacobsson, S., et al., “Throughput analysis of massive MIMO uplink with low-resolution ADCs,” IEEE Transactions on Wireless Communications, vol. 16, No. 6, Jun. 2017, pp. 4038-4051. |
Jaeckel, S., et al., “QuaDRiGa: A 3-D multi-cell channel model with time evolution for enabling virtual field trials,” IEEE Transactions on Antennas and Propagation, vol. 62, No. 6, Jun. 2014, pp. 3242-3256. |
Jia, H., et al., “A microprocessor implemented in 65nm CMOS with configurable and bit-scalable accelerator for programmable in-memory computing,” https://arxiv.org/ftp/arxiv/papers/1811/1811.04047.pdf, Nov. 2018, 10 pages. |
Joham, M., et al., “Linear transmit processing in MIMO communications systems,” IEEE Transactions on Signal Processing, vol. 53, No. 8, Aug. 2005, pp. 2700-2712. |
Larsson, E.G., et al., “Massive MIMO for next generation wireless systems,” IEEE Communications Magazine, vol. 52, No. 2, Feb. 2014, pp. 186-195. |
Li, S., et al., “DRISA: A DRAM-based reconfigurable in-situ accelerator,” 5-th Annual IEEE/ACM Sympoium on Microartchitecture, Oct. 14-18, 2017, Cambridge, MA, pp. 288-301. |
Luo, Z., et al., “Semidefinite relaxation of quadratic optimization problems,” IEEE Signal Processing Magazine, vo. 27, No. 3, May 2010, pp. 20-34. |
Mo, J., et al., “Capacity analysis of one-bit quantized MIMO systems with transmitter channel state information,” IEEE Transactions on Signal Processing, vol. 63, No. 20, Oct. 15, 2015, pp. 5498-5512. |
Mo. J., et al., “Hybrid architectures with few-bit ADC receivers: Achievable rates and energy rate tradeoffs,” IEEE Transactions on Wireless Communications, vol. 16, No. 4, Apr. 2017, pp. 2274-2287. |
Nair, R., “Evolution of memory architecture,” Proceedings of the IEEE, vol. 103, No. 8, Aug. 2015, pp. 1331-1345. |
Parikh, N., et al., “Proximal algorithms,” Foundations and Trends in Optimization, vol. 1, No. 3, 2013, pp. 123-231. |
Paulraj, A., et al., “Introduction to space-time wireless communications,” Cambridge University Press, May 29, 2003, 277 pages. |
Petersen, K. B., et al., “The matrix cookbook,” Nov. 15, 2012, 72 pages. |
Pi, Z., et al., “An introduction to millimeter-wave mobile broadband systems,” IEEE Communications Magazine, vol. 49, No. 6, Jun. 2011, pp. 101-107. |
Rappaport, T.S., et al., “Millimeter wave mobile communications for 5G cellular: It will work!,” IEEE Access, vol. 1, May 2013, pp. 335-349. |
Roh, W., et al., “Millimeter-wave beamforming as an enabling technology for 5G cellular communications: Theoretical feasibility and prototype results,” IEEE Communications Magazine, vol. 52, No. 2, Feb. 2014, pp. 106-113. |
Roth, K., et al., “Achievable rate and energy efficiency of hybrid and digital beamforming receivers with low resolution ADC,” IEEE Journal on Selected Areas in Communications, vol. 35, No. 9, Sep. 2017, pp. 2056-2068. |
Rusek, F., et al., “Scaling up MIMO: Opportunities and challenges with very large arrays,” IEEE Signal Process Magazine, vol. 30, No. 1, Jan. 2013, pp. 40-60. |
Sadhu, B., et al., “A 28GHz 32-element phased-array transceiver IC with concurrent dual polarized beams and 1.4 degree beam-steering resolution for 5G communication,” 2017 IEEE International Solid-State Circuits Conference, Feb. 2017, pp. 128-129. |
Shah, S., et al., “Biconvex relaxation for semidefinite programming in computer vision,” European Conference on Computer Vision (ECCV), Sep. 2016, pp. 717-735. |
Studer, C. et al., “ASIC implementation of soft-input soft-output MIMO detection using MMSE parallel interference cancellation,” IEEE Journal of Solid-State Circuits, vol. 46, No. 7, Jul. 2011, pp. 1754-1765. |
Studer, C., et al., “Quantized massive MU-MIMO-OFDM uplink,” IEEE Transactions on Communications, vol. 64, No. 6, Jun. 2016, pp. 2387-2399. |
Swindlehurst, A.L., et al., “Millimeterwave massive MIMO: The next wireless revolution?,” IEEE Communications Magazine, vol. 52, No. 9, Sep. 2014, pp. 56-62. |
Tse, D., et al., “Fundamentals of Wireless Communication,” Cambridge University Press, Jun. 27, 2005, 586 pages. |
Tuchler, M., et al.,“Minimum mean squared error equalization using a priori information,” IEEE Transactions on Signal Processing, vol. 50, No. 3, Mar. 2002, pp. 673-683. |
Verdu, S., “Computational complexity of optimum multiuser detection,” Algorithmica, vol. 4, No. 1, Jun. 1989, pp. 303-312. |
Wu, M., et al., “High-throughput data detection for massive MU-MIMO-OFDM using coordinate descent,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 63, No. 12, Dec. 2016, pp. 2357-2367. |
Wu, M., et al., “Large-scale MIMO detection for 3GPP LTE: Algorithm and FPGA implementation,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, No. 5, Oct. 2014, pp. 916-929. |
Wulf, W. A., et al., “Hitting the memory wall: Implications of the obvious,”ACM SIGARCH Computer Architecture News, vol. 23, No. 1, Mar. 1995, pp. 20-24. |
Yan, H., et al., “Performance, power, and area design trade-offs in millimeter-wave transmitter beamforming architectures,” IEEE Circuits and Systems Magazine, vo. 19, No. 2, May 2019, pp. 33-58. |
Zimmerman, R., “Computer arithmetic: Principles, architectures, and VLSI design,” Integrated Systems Laboratory, Swiss Federal Institute of Technology (ETH), Zürich, Switzerland, Mar. 16, 1999, 26 pages. |