The present invention relates to a pre-compensation method and a pre-compensation circuit thereof, and more particularly, to a pre-compensation method and a pre-compensation circuit, which reduce computational complexity or improve accuracy or efficiency.
Power amplifiers are inherently non-linear. The nonlinearity degrades the bit-error rate (BER) and data throughput. To reduce the nonlinearity, a power amplifier (PA) is run at lower power, which results in low efficiencies.
On the other hand, digital pre-distortion (DPD) is one of the most cost-effective linearization techniques. A digital pre-distorter (or its algorithm) needs to analyze or model the PA behavior accurately for successful deployment of the digital pre-distorter. As the signal bandwidth gets wider, a PA that has more complex frequency, electrical, and thermal behaviors tends to exhibit memory effects. Therefore, more advanced DPD algorithms are required. The most general algorithm for DPD implementation is Volterra series and its derivatives. However, the large number of coefficients of the Volterra series makes it unattractive for practical applications, because the large number of coefficients increases the computational burden and requires more input/output data for statistical confidence in the unknown coefficients. Therefore, there is still room for improvement when it comes to DPD.
It is therefore a primary objective of the present invention to provide a pre-compensation method and a pre-compensation circuit thereof to reduce computational complexity or improve accuracy or efficiency.
An embodiment of the present invention discloses a pre-compensation method, for a pre-compensation circuit coupled to a power amplifier, comprising performing pre-distortion according to at least one parameter or at least one hyperparameter to convert a first input signal received by the pre-compensation circuit into a first pre-distortion output signal; updating the at least one parameter or the at least one hyperparameter according to Bayesian Optimization, Causal Bayesian Optimization, or Dynamic Causal Bayesian Optimization; and performing pre-distortion according to the at least one parameter updated or the at least one hyperparameter updated to convert a second input signal received by the pre-compensation circuit into a second pre-distortion output signal.
An embodiment of the present invention discloses a pre-compensation circuit, coupled to a power amplifier, comprising a digital pre-distortion actuator, configured for performing pre-distortion according to at least one parameter or at least one hyperparameter to convert a first input signal received by the digital pre-distortion actuator into a first pre-distortion output signal; and a training module, configured for updating the at least one parameter or the at least one hyperparameter according to Bayesian Optimization, Causal Bayesian Optimization, or Dynamic Causal Bayesian Optimization, wherein the digital pre-distortion actuator performs pre-distortion according to the at least one parameter updated or the at least one hyperparameter updated to convert a second input signal received by the digital pre-distortion actuator into a second pre-distortion output signal.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
Digital pre-distortion (DPD) may correct for the nonlinearity of the PA 150 by modifying the complex waveform at the input before signal(s) enter a digital-to-analog converter (DAC). In another aspect, the nonlinearity of the PA 150 may correct the waveform of a pre-distortion output signal z(t) of the pre-compensation circuit 120, so the cascade/combination of the pre-compensation circuit 120 and the PA 150 may achieve linearization. As a result, an output signal y(t) of the PA 150 after the pre-compensation circuit 120 appears to come from a highly linear component, and an input signal x(t) may be amplified by a constant gain. As indicated by dotted lines in
DPD (function) may be defined algorithmically through software. In an indirect learning architecture, an algorithm may use two identical memory polynomial models for predistortion and training. For example, the pre-compensation circuit 220 may include a DPD actuator 221 and a training module 222. The DPD actuator 221 is configured to perform pre-distortion of the input signal x(t) in real time. The DPD actuator 221 performs the same/similar computation for each input signal x(t) at high speed, so it is suitable for the DPD actuator 221 to be implemented by Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), or other circuits. The training module 222 updates parameters or hyperparameters of the DPD actuator 221 based on observations of the output signal y(t) of the PA 250 (as indicated by dotted lines in
Using an indirect learning architecture to train the pre-compensation circuit 220 enables the pre-compensation circuit 220 to be constructed directly (i.e., to find/determine parameters or hyperparameters of the DPD actuator 221) according to the input signal x(t) and the output signal y(t) of the PA 250. The feedback path of the training module 222 may take y(t)/G as its input and {circumflex over (z)}(t) as its output, where G controls the gain of the PA 250 after linearization. The DPD actuator 221 may be regarded as a copy of the feedback path (i.e., a copy of the training module 222), which may take the input signal x(t) as its input and the pre-distortion output signal z(t) as its output. The ideal is to make y(t)=G×x(t), which renders z(t)={circumflex over (z)}(t) and makes an error signal e(t) between the pre-distortion output signal z(t) and the output {circumflex over (z)}(t) satisfy e(t)=0. The algorithm converges when the error signal e(t), which is equal to the difference between the pre-distortion output signal z(t) and the output {circumflex over (z)}(t), or the square of the error signal e(t) (i.e., ∥e(t)∥2) is minimized.
Due to memory effect, the PA 250 becomes a nonlinear system with memory. That is, the output signal y(t) (currently/instantaneously) inputted by the PA 250 not only depends on the current input, but also is a function of the previous/past input value. For example, it satisfies y(t)=fPA(z(τ)), where τ ∈(−∞,t).
Regarding the memory effect,
Specifically, the input signal x(t) received by a DPD actuator 321 of a pre-compensation circuit 320 of the amplification circuit 30 is pre-distorted by the DPD actuator 321 using parameters and hyperparameters of the DPD actuator 321 to thereby output the pre-distortion output signal z(t). The pre-distortion output signal z(t) is amplified by a PA 350 of the amplification circuit 30 to provide the output signal y(t) at the output of the PA 350. A training module 322 of the pre-compensation circuit 320 computes/trains parameters and hyperparameters, such that the pre-distortion introduced by the DPD actuator 321 may compensate for the nonlinearity of the PA 350 (i.e., the pre-distortion is an inverse of the distortion resulting from the nonlinearity of the PA 350). The training module 322 may receive a (digital) signal S4, which represents the output signal y(t) from the output of the PA 350. According to the difference between a signal S1 output by the training module 322 and a signal S2, the combiner 223 (which may be a subtraction unit/circuit) of the pre-compensation circuit 320 outputs an error signal S3. The training module 322 observes the error signal S3 and adaptively configures the parameters and the hyperparameters to minimize a loss function or the error signal S3 (e.g., to make the error signal S3 zero). The loss function may be a function of the output ŷ(t) (e.g., an estimated value) of the model or a function of a label (e.g., an expected value) corresponding to the output ŷ(t). For example, the loss function may be equal to the difference between the output ŷ(t) and the label. The label may be is a function of the input signal x(t).
In one embodiment, the signal S1 of the training module 322 may, for example, be divided into the output {circumflex over (z)}(t) (as shown by a solid line in
In another embodiment, the signal S1 of the training module 322 may be, for example, {circumflex over (z)}(t)+ŷ(t−1)/G, {circumflex over (z)}(t)+{circumflex over (z)}(t−1), or a0×{circumflex over (z)}(t)+a1×{circumflex over (z)}(t−1)+a2×{circumflex over (z)}(t−2) . . . +ak×{circumflex over (z)}(t−k), where coefficients a0 to ak are real numbers. In other words, the signal S1 is not only related to the current output signal y(t) of the PA 350, but also related to previous output signals y(t−1), y(t−2), . . . , or y(t−k) output from the PA 350 in the past. The signal S2 may be, for example, z(t)+y(t−1)/G, z(t)+z(t−1), or a0×z(t)+a1×z(t−1)+a2×z(t−2) . . . +ak×z(t−k). In other words, the signal S2 is not only related to the current pre-distortion output signal z(t), but also related to previous pre-distortion output signal z(t−1), z(t−2), . . . , or z(t−k). The error signal S3 between the signals S1 and S2 may be, for example, e(t)+e(t−1) or box e(t)+b1×e(t−1)+b2×e(t−2) . . . +bk×e(t−k), where coefficients b0 to bk are real numbers. In other words, the signal S3 is not only related to the error signal e(t), but also related to previous error signal e(t−1), e(t−2), . . . , or e(t−k). The parameters or hyperparameters of the response function PA−1 of the DPD actuator 321 may be determined/obtained by minimizing the loss function. As a result, the pre-distortion output signal z(t) that may reduce the final nonlinearity of the output signal y(t) may be determined/obtained.
As set forth above, the present invention may adopt a recurrent DPD-PA cascade model instead of a Volterra series model. For example,
Before the training of the recurrent DPD-PA cascade model 40 starts, the CFR 210 may be turned off. Otherwise the CFR 210 may affect the training of the recurrent DPD-PA cascade model 40. That is, the training of the pre-compensation circuit 220 occurs when samples of the PA 250 are collected with the CFR 210 disabled, such that all characteristics of the PA 250 may be obtained to train the pre-compensation circuit 220.
In the training (stage) of the recurrent DPD-PA cascade model 40, firstly, the pre-distortion output signal z(t) and the output signal y(t) corresponding to sample(s) of the input signal x(t) are generated according to an (initial) response function PA−1 of the training module 322 and the DPD actuator 321. The pre-distortion output signal z(t) and the output signal y(t) may be used for incremental learning on the training module 322 so as to update the training module 322 (i.e., the response function PA−1 in
After the pre-compensation circuit 320 is updated by the parameters or hyperparameters of the model for the response function PA−1 trained for time instants t to t−k, the training (stage) is completed, and the CFR 210 may be turned on to test the recurrent DPD-PA cascade model 40.
More specifically, in the training (stage), the recurrent DPD-PA cascade model 40 may use Dynamic Causal Bayesian Optimization (or Bayesian Optimization, Causal Bayesian Optimization) to optimize parameters/hyperparameters of the recurrent DPD-PA cascade model 40 instead of using backward propagation to train the recurrent DPD-PA cascade model 40 so as to reduce computational complexity.
Take Bayesian Optimization as an example. Bayesian Optimization is a black-box optimization algorithm for solving extremum problems of functions whose expressions are unknown. For example, L(P1, P2, . . . , Pm, HP1, HP2, . . . , HPn)=uef( ) P1, P2, . . . , Pm, HP1, HP2, . . . , HPn), where L( ) may represent the loss function of a model (which may serve as an objective function), uef( ) may represent a function whose expression is unknown, P1 to Pm may represent parameters of the model, and HP1 to HPn may represent hyperparameters of the model, and m and n are positive integers. In other words, the expression of the relationship function uef( ) among the loss function, the parameter(s) and the hyperparameter(s) of the model is unknown. The parameter(s) and the hyperparameter(s) at arbitrary time to minimize/maximize the loss function L( ) may be calculated by using Bayesian Optimization. In this way, the pre-compensation circuit 320 may be updated to combine the nonlinearity of the pre-compensation circuit 320 and the PA 350 into one linear result (i.e., to compensate for nonlinearity).
For example,
Since the expression of the relationship function uef( ) is unknown, Bayesian Optimization may roughly fit the relationship function uef( ) using partial/finite sampling points and leverage information of previous sampling point(s) to determine the next sampling point so as to find extremum point(s). For example,
Bayesian Optimization estimates mean value(s) and variance(s) of the true loss function based on the function values of the sampling points that have been found (e.g., the loss function corresponding to the solid black point P1) to determine the next sampling point (e.g., the solid black point P2) according to the sampling point already found (e.g., the solid black point P1). The estimated loss function (i.e., the mean value of the loss function at each point) represented by the thick solid line in
In one embodiment, the loss function L( ), the parameters P1 to Pm, and the hyperparameters HP1 to HPn corresponding to one of the solid black points P1-P5 may be obtained from the input signals x(t) to x(t−k), the pre-distortion output signals z(t) to z(t−k), the output signals y(t) to y(t−k), the signals {circumflex over (z)}(t) to {circumflex over (z)}(t−k), ŷ(t) to ŷ(t−k), other signals, or other data internally stored, but is not limited thereto. In one embodiment, the signals ŷ(t) to ŷ(t−k) may be functions of the signals {circumflex over (z)}(t) to {circumflex over (z)}(t−k).
The algorithm of the present invention may use Gaussian process regression to predict the probability distribution of a function value of the loss function L(at any point based on the function values of the objective function at a set of sampling points. Gaussian process regression may extend to observations with independent normally distributed noise of known variance. The variance may be unknown, so it may assume that the noise is of common variance and that the noise includes the variance as a hyperparameter. The present invention uses the posterior mean of the Gaussian process that includes noise, which is a drift value rather than the noise of an SINR. In one embodiment, environmental factors such as temperature and humidity or the PA 350 itself may have an influence on the gain of the PA 350, causing a drift value of the loss function with respect to certain parameters and hyperparameters. In other words, the solid black point P5 may not select/correspond to the desired/expected extremum of the functional relationship uef( ), but may select/correspond to a relatively optimized extremum close to the desired/expected extremum of the functional relationship uef( ).
According to result(s) of Gaussian process regression, an acquisition function (which is used to measure the degree that each point of the loss function is worth exploring) may be constructed to solve a (relative) extremum of the acquisition function so as to determine the next sampling point of the loss function. The acquisition function may be, for example, knowledge gradient (KG), entropy search (ES), or predictive entropy search (PES). Afterwards, the extremum of the loss function of the set of sampling points (which have been found since the beginning) is returned as the extremum of the loss function (e.g., the minimum loss function in response to the optimal parameters and the optimal hyperparameters). The parameters P1 to Pm and the hyperparameters HP1 to HPn to update the pre-compensation circuit 320 may thus be found.
In one embodiment, there may be many independent variables to be considered by the algorithm of the present invention (in addition to the parameters P1 to Pm and the hyperparameters HP1 to HPn). When the spatial dimension grows, the performance of Bayesian Optimization may deteriorate exponentially. Therefore, the algorithm of the present invention may extend to Causal Bayesian Optimization (CBO). In other words, the present invention may use Causal Bayesian Optimization to calculate the optimal/minimum loss function when the loss function L(is related to the parameters P1 to Pm, the hyperparameters HP1 to HPn, and other independent variable(s).
Specifically, the present invention may find the causal relationship between the loss function L( ), the parameters P1 to Pm, the hyperparameters HP1 to HPn, and/or other independent variable(s) (e.g., a causal graph of the loss function L( ), the parameters P1 to Pm, the hyperparameters HP1 to HPn, and/or other independent variable(s)). Therefore, the loss function L( ), the parameters P1 to Pm, the hyperparameters HP1 to HPn, and other independent variable(s) may be regarded as causal variables. For example,
In one embodiment, a causal model for optimization may be selected based on maximum a posterior (MAP) and point estimation to obtain/derive a causal graph of a loss function, parameter(s), hyperparameter(s), and other independent variable(s). Accordingly, causal variables of a causal graph of the causal model (e.g., the number of the causal variables, which attributes a causal variable has, or the number of the attributes of a causal variable) and a causal structure of the causal graph (e.g., how attributes connect to each other) are determined/found/created together (at a time or in one go). Deciding the causal variables and the causal structure simultaneously/parallelly may avoid problems incurred by deciding first causal variables and then a causal structure.
For example,
In
In one embodiment, a posterior probability P(ƒi, C|wi) of assigning the subdata wi of the grounding data 80g to the observation function ƒi and a causal structure C of the causal graph CG may be maximized so as to determine/derive the corresponding causal structure C and the corresponding causal variable cvi based on the subdata wi of the grounding data 80g. Accordingly, inference of the causal model may be described by combining Bayesian network (e.g., for the causal structure) with the observation functions (e.g., ƒ(i−1), ƒi, ƒ(j−1), and ƒ). It is noteworthy that causal variables (e.g., cv(i−1), cvi, cv(j−1), and cvj) and the corresponding causal structure (e.g., C) of the corresponding causal graph (e.g., CG) are obtained/determined together (namely, the causal variables (e.g., cv(i−1), cvi, cv(j−1), and cvj) are learned along/together with the causal structure (e.g., C)), so the causal variables (e.g., cv(i−1), cvi, cv(j−1), and cvj) and the causal structure (e.g., C) may interact/affect/constrain each other.
In one embodiment, the posterior probability P(ƒi,C|wi, Int) may satisfy P(ƒi,C|wi, Int) ∝P(ƒi, C) P(wi|ƒi,C, Int) according to the Bayesian rule, where f may denote the corresponding observation function, C may denote the corresponding causal structure, wi may denote part of the grounding data 80g (e.g., subdata), and Int may denote intervention. In one embodiment, the posterior probability P(ƒi,C|wi) may be proportional to P(ƒi,C) P(wi|ƒi,C) or Πt=0TP(wi,t|st-1, C, ƒi)(T-t)
or Σs
As set forth above, Bayesian probability mechanism may combine the number of causal variables (e.g., including the causal variables cv(i−1), cvi, cv(j−1), and cvj), states of the causal variables, a causal structure of the causal variables, or observation functions for the causal variables (e.g., including the observation functions ƒ(i−1), ƒi, ƒ(j−1), and ƒj) and draw relevant joint inferences to explain/interpret the grounding data 80g, thereby creating the causal graph CG2. The causal variables (e.g., including the causal variables cv(i−1), cvi, cv(j−1), and cvj) of the causal graph CG2 (or the number of the causal variables) and the causal structure (e.g., C) are determined at the same time, thereby differentiating (a) from (b) of
As shown in
In one embodiment, the observation function ƒi may satisfy si,t=fi(wi,t). In one embodiment, the observation function ƒi may be implemented using multivariate Gaussian distribution: For example, the observation function ƒi may satisfy
Alternatively, the observation function ƒi may be related to
where z may denote subdata (which does not contribute to the causal variable cvi) within the grounding data 80g, μw
Each of the matrixes Lw
In one embodiment, the relationship between causal variables (e.g., cvi) and subdata (e.g., wi) may be unknown, but the causal variables may be predicted/inferred from the subdata using a causal semantic generative model. For example,
In one embodiment, Causal Bayesian Optimization may perform optimization only for causal variables directly related to the loss function (e.g., the parameters P1 to Pm, the hyperparameters HP1 to HPn, and independent variables O1 to Oq in causal graph CG1, which directly point to or affect the loss function LO). In other words, the causal intrinsic dimensionality of Causal Bayesian Optimization is given by the number of the parameters P1 to Pm, the hyperparameters HP1 to HPn, and independent variables O1 to Oq, which are causes/parents of the loss function L( ), rather than the number of causal variables which are causes of the parameters P1 to Pm, the hyperparameters HP1 to HPn, and independent variables O1 to Oq, thereby improving the ability to reason about optimal decision making strategies.
In one embodiment, causal variables (e.g., the parameters P1 to Pm, the hyperparameters HP1 to HPn, the independent variables O1 to Oq, which serve as causal variables, or the causal variables cv(i−1), cvi, cv(j−1), cvj) are manually defined (e.g., by domain expert(s)). For example, causal variables are defined by domain experts (nonautomatically and individually); alternatively, causal variables are defined automatically using a program with rules described by domain experts. In one embodiment, subdata (e.g., the subdata w(i−1), wi, w(j−1), and wj corresponding to the framed areas in
Causal Bayesian Optimization treats causal variables being output (e.g., the loss function LO) and causal variables being input (e.g., the parameters P1 to Pm, the hyperparameters HP1 to HPn, or the independent variables O1 to Oq) as invariant independent variables, and disregards the existence of a temporal evolution in both the causal variables being output and the causal variables being input (i.e., whether the causal variables being output and the causal variables being input change over time), and thus breaks the time dependency structure existing among causal variables. While disregarding time may significantly simplify the problem, it prevents the identification of an optimal intervention at every time instant, and (especially in a non-stationary scenario) may lead to a sub-optimal solution instead of providing the current optimal solution at any time instant. Thus, the present invention may extend to Dynamic Causal Bayesian Optimization, which offer/account for the causal relationship between causal variables and the causal relationship may evolve/change over time, and thus facilitates in scenarios where all causal effects in a causal graph vary over time.
For example,
A pre-compensation method in the embodiment of the present invention is available to the amplification circuits 10 to 30 or the recurrent DPD-PA cascade model 40, and may be compiled into a code, which may be executed by a processing circuit and stored in a storage circuit. The steps of the pre-compensation may include the following steps:
In one embodiment, the storage circuit is configured to store image data or instructions. The storage circuit may be a read-only memory (ROM), a flash memory, a random access memory (RAM), a hard disk, a non-volatile storage device, a non-transitory computer-readable medium, but is not limited thereto. In one embodiment, the processing circuit is configured to execute instructions (stored in the storage circuit). The processing circuit may be a microprocessor, or an application-specific integrated circuit (ASIC), but is not limited thereto. The amplification circuit 10, 20, or 30 may be utilized in a radio unit (RU), but is not limited thereto.
In summary, the present invention proposes a recurrent DPD-PA cascade model trained by Dynamic Causal Bayesian Optimization. (DPD is actually a “copy of PA1”.) This way, computational complexity and calculation can be greatly reduced. The DPD may be updated by the copy of PA−1 in real time as opposed to at intervals, which may effectively compensate for deficiencies of a power amplifier like nonlinearity.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
111147015 | Dec 2022 | TW | national |