The present disclosure generally relates to the field of satellite communication technology and, more particularly, relates to an on-ground physical-model based predistorter, a machine-learning based predistorter controller, and a high power amplifier linearization method.
In satellite communication (SATCOM) system, a simple “bent-pipe” transponder is widely adopted to convert uplink carrier frequencies to downlink carrier frequencies for transmission of information without having on-board processing capability. The transponders are equipped with high power amplifiers (HPAs), which like other amplifier modules in communication systems, cause nonlinear distortions to transmitted signals when HPAs are operated at or close to their saturation points to maximize power efficiency. These nonlinearities can be characterized as amplitude modulation-to-amplitude modulation (AM-AM), and amplitude modulation-to-phase modulation (AM-PM) effects, which degrade the transmission performance of the system.
In order to maximize the transponder throughput along with the HPA power efficiency, additional processing techniques have to be put in place. The predistortion (PD) is one of the most effective methods to compensate for HPA nonlinearities. The principle of the PD is to distort the HPA input signal by an additional device whose characteristics are the inverse of those of the amplifier. In the past, the predistortion technique employs look-up table (LUT) method, polynomial method, channel inversion method, etc. However, the presence of radio interferences brings challenges to SATCOM system.
To improve HPA's linearity for the bent-pipe transponder and reduce transponder's vulnerability to interference, the present disclosure provides a PD solution (e.g., a high power amplifier linearization method) for SATCOM to take the current and future satellite configurations, and practical implementation into account. According to the present disclosure, only on-ground modifications are needed in order to satisfy the low-cost demand. The PD solution adopts a physical-model based PD and a machine-learning based PD controller such that HPA linearity can be significantly improved in the presence of radio interference. In the physical-model based PD, the proposed solution accurately models the PD based on a carefully-selected HPA model and successfully identifies the PD parameters. The machine-learning based PD controller is deployed to deal with the impact of the external factors, such as equipment imperfections, temperature variation, interference signals, etc., and also feed an error-correction signal to the PD to adjust the parameters in a real-time fashion.
One aspect of the present disclosure provides a high power amplifier (HPA) linearization method, applied to a ground hub which includes a predistorter (PD) and a PD controller. The PD controller stores a database including a set of environmental parameters and a set of PD parameters corresponding to the set of environmental parameters. The ground hub is arranged in a SATCOM system together with a transmitter and a satellite transponder, and the satellite transponder includes an HPA. The HPA linearization method includes determining an initial correction signal including an AM-AM correction signal and an AM-PM correction signal based on a physical model with a plurality of PD parameters, and sending the initial correction signal to the transmitter to compensate AM-AM and AM-PM characteristics of the HPA; receiving a signal from the satellite transponder; determining a reward function for an action taken by the PD controller based on the received signal and environmental parameters; examining an action-value function for actions taken in a preset past period based on reward functions of the actions; taking an action, based on the environment parameters and the action-value function, to adjust the plurality of PD parameters for the PD to generate an updated correction signal; sending the update correction signal to the transmitter to compensate the AM-AM and AM-PM characteristics of the HPA; and updating the database in a real-time manner according to the environment parameters, the plurality of PD parameters, and the action-value function.
Another aspect of the present disclosure provides a PD disposed in a ground hub for HPA linearization. The ground hub is arranged in a SATCOM system together with a transmitter and a satellite transponder, and the satellite transponder includes an HPA. The PD includes a processor and a memory. The memory is configured to store computer-executable instructions. When the computer-executable instructions are executed, the processor is configured to determine an initial correction signal, including an AM-AM correction signal and an AM-PM correction signal, based on a physical model with a plurality of PD parameters, and send the initial correction signal to the transmitter to compensate AM-AM and AM-PM characteristics of the HPA; and in respond to the PD controller taking an action to adjust the plurality of PD parameters, generate an updated correction signal, and send the updated correction signal to the transmitter to compensate the AM-AM and AM-PM characteristics of the HPA.
Another aspect of the present disclosure provides a PD controller arranged in a ground hub. The ground hub further includes a PD and is arranged in a SATCOM system together with a transmitter and a satellite transponder, the satellite transponder includes an HPA, and the PD controller receives a signal from the satellite transponder. The PD controller includes a processor, a memory, and a data storage. The memory is configured to store computer-executable instructions, and the data storage is configured to store a database including a set of environmental parameters and a set of PD parameters corresponding to the set of environmental parameters. When the processor executes the computer-executable instructions, the PD controller determines a reward function for an action taken by the PD controller based on the received signal and environmental parameters; examine an action-value function for actions taken in a preset past period based on reward functions of the actions; take an action, based on the measured environment parameters and the action-value function, to adjust a plurality of PD parameters for the PD to generate an updated correction signal; and update the database, including the set of environmental parameters and the set of PD parameters, in a real-time manner according to the environment parameters, the plurality of PD parameters, and the action-value function.
Other aspects of the present disclosure can be understood by those skilled in the art in light of the description, the claims, and the drawings of the present disclosure.
The following drawings are merely examples for illustrative purposes according to various disclosed embodiments and are not intended to limit the scope of the present disclosure.
Reference will now be made in detail to exemplary embodiments of the invention, which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
Error! Reference source not found. illustrates the considered system concept of operations (CONOPS) of the present disclosure. According to the CONOPS, ground users communicate with each other via satellite uplinks (U/L) and downlinks (D/L), and each satellite serves as a transmission relay node. According to the system model described above, the satellite D/L transmission signal may suffer from the AM-AM and AM-PM distortion effects when the HPA of the satellite is operated at the saturation point. In practice, to avoid the signal distortion caused by HPA nonlinearity, the HPA is normally set at 3 dB below its saturation point. However, because the U/L interference signal can adaptively push the input signal power of the HPA to its saturation point and cause the signal nonlinear distortion, the presence of the interference can bring a new challenge to the HPA linearization problem, since. As shown in
To tackle the addressed nonlinear problems of HPAs in the presence of radio interference, the present disclosure provides a high power amplifier linearization method, an on-ground physical-model based PD, and a machine-learning based PD controller. According to various embodiments of the present disclosure, an affordable PD solution with only on-ground modifications may be adopted to achieve HPA linearization. As demonstrated in
Specifically, corresponding to the considered CONOPS described above, the on-ground PD may be able to estimate and correct the AM-AM and AM-PM nonlinear distortions by itself when interference signals and other external factors are not the cause of distortion. When the U/L jammer or other external factors such as equipment imperfections and non-ambient temperature drives the HPA input power to the saturation point, the on-ground PD controller may use a machine learning model pre-trained by various transmit/receive signals and HPA characteristics to adjust the parameters of the PD physical model, and may send the correction signal to the transmitter.
According to the present disclosure, to compensate the AM-AM and AM-PM effects of the HPA, an accurate physical-model based PD may be developed based on an extended Saleh's model. The Saleh's model is a commonly used power amplifier model, and has been proposed for characterizing HPA's amplitude (AM-AM) and phase (AM-PM) distortions accurately. The proposed extended model is designed especially for travelling wave tube (TWT) and/or solid HPAs. The extended Saleh's model may account for eight unknown parameters. In the following, a detailed description of the eight parameters of the extended Saleh's model will be provided.
X(t)=ρx(t)ejθ
Y(t)=ρy(t)ejθ
Z(t)=ρz(t)ejθ
where ρx(t), ρy(t), ρz(t), and θx(t), θy(t), θz(t) are the amplitude and phase of the complex signals X(t), Y(t), and Z(t), respectively. By using M(ρy(t)) and Φ(ρy(t)) to respectively represent the normalized AM-AM and AM-PM responses of the HPA due to the input signal X(t), Eq. (3) may then become:
Z(t)=M(ρy(t))ej(θ
Then, the original Saleh's model may be extended for the HPA by including eight extra parameters (a0, a1, b0, and b1, together with α0, α1, β0, and β1), resulting in the following generalized equations for M(ρy(t)) and Φ(ρy(t)):
The eight unknown coefficients a0, a1, b0, b1, α0, α1, β0 and β1 may be computed to give the best fit of the parameter-dependent Eqs. (5) and (6) to the measured data. Note that these unknown parameters will be adjusted in the presence of interference by the proposed machine-learning based PD controller.
Further, from Eq. (2), exp (jθy(t)) may be written as:
By using Eqs. (5), (6), and (7) to substitute the corresponding terms in Eq. (4), then
Further, from Eq. (8), it is obvious that the desired PD output for a given input X(t) may be:
For example,
In detail, when the measured data (data for normal ambient temperature) of Eqs. (5) and (6) are provided, estimating coefficients qAM=(a0, b0, α0, β0) and qPM=(a1, b1, α1, β1) may be a problem to consider. That is, after providing the measured data Md(ρy(ti)) for AM-AM response of the HPA at time ti, i=1, 2, . . . , N, it may further need to determine a best estimate for {circumflex over (q)}AM by using a least-squares criterion to minimize
J(qAM)=Σ|M(ρy(t),qAM)−Md(ρy(t))|2 (10)
where M(ρy(t), qAM) is the normalized AM-AM response of the HPA due to the input signal Y(ti) at each time ti, i=1,2, . . . , N corresponding to the parameter set qAM. The parameter estimation problem for the parameter set qPM can also be formulated similarly.
The above optimization problem is an unconstrained minimization problem. The method used to obtain the solution to this minimization problem is based on function information computed on sequences of simplexes and is known as the Nelder-Mead algorithm. Basically, given the optimization problem
where q=qAM or qAM∈R4, the Nelder-Mead algorithm may maintain a simplex of approximations to an optimal point. Further, it is assumed that the vertices {qj}j=1n+1 are sorted according to the objective function values
J(q1)≤J(q2) . . . J(qn+1) (11)
The point q1 may be referred to as the best vertex and qn+1 may be the worst vertex. The algorithm may attempt to change the worst vertex qn+1 to a new point using the formula
q(δ)=(1+δ)
Here,
The typical value for the parameter δ may be
δ={δr,δe,δo,δi}={1,2,0.5,−0.5} (14)
which corresponds to the reflection, expansion, outside contraction, and inside contraction steps of the Nelder-Mead iteration.
The algorithm may terminate when either J(q1)-J(qn+1) is sufficiently small in absolute value or the number of function evaluations is larger than a user-prescribed value. In one embodiment, the optimal sets of parameters, which gave the desired least-squares fit to the HPA measured data may be given by
q
AM={3.6407,0.3063,11.1163,4.2947} (15)
and
q
PM={0.4978,0.1273,74.6172,1.0879} (16)
With these estimated parameters, the derived AM-AM and AM-PM characteristics of the HPA according to Eqs. (5) and (6) are plotted in Error! Reference source not found.
Further, the amplitude and the phase of the PD may be directly derived from Eq. (9), and the parameters obtained by modeling Eqs. (15) and (16) for the HPA may be copied to the PD. By implanting Eq. (1) into Eq. (9), the amplitude and the phase of the signal at the output of the PD may be
It should be mentioned that the AM-AM relationship shown in Eq. (17) is usually expressed in terms of normalized input and output voltages or normalized input and output power. Solving for ρy(t) from Eq. (17) and selecting only the negative sign for the square-root term in the numerator, ρy(t) may be equal to
It should be noted that because of the normalized input and output relationship, the negative sign is chosen to meet the dual conditions that the square root term is real and 0≤ρy(t)≤1. When the amplitude and phase of the desired PD are assumed to be ρPD and θPD, respectively, the output y(t) of the PD may then be rewritten as
Y(t)=X(t)ρPDejθ
Further, by implanting Eqs. (18) and (19) into Eq. (20) and equating the amplitude and phase terms separately for the desired PD, ρPD=ρy(t)/ρx(t) may then become
It should be noted that the saturation condition for the HPA is implied in Eq. (21). With Eqs. (21) and (22) derived along with curve-fitting values in Eqs. (15) and (16), the amplitude and phase responses of the PD can be plotted as
In one embodiment, the impact of the disclosed PD on the bit error rate (BER) performance may be numerically evaluated. To establish the reference BER performance, a computer simulation may be implemented based on the simplified system set-up which the PD is placed on-board before the HPA.
To assess the end-to-end BER performance, a simplified transponder simulation model developed based on an actual transponder model may be adopted. However, for a high resolution BER curve, a simplified transponder model that characterizes the transponder imperfections may be needed to reduce simulation time from many hours to the order of 10-minutes. The simplified transponder model may characterize transponder intermodulation (IM) noise caused by HPA nonlinearity and transponder hardware imperfections, such as phase noise, amplitude ripple/phase shift due to filtering, quantization noise, and automatic gain control (AGC) effects, etc. It should be noted that the IM effect is characterized by noise power ratio (NPR). In addition, this simplified transponder model may also characterize actual waveform distortion due to AM-AM and AM-PM effects of the HPA.
In order to generate test signals using an additive white Gaussian noise (AWGN) to obtain NPR values, an industry standard method may be adopted, so the white noise may be first passed through a band pass filter (BPF) to produce an approximately square pedestal of noise of about the same bandwidths as signals being simulated. This signal may then be passed through a narrow-band reject filter to produce a deep notch (typically >50 dB) at the center of the noise pedestal. The noise signal may be used to excite the test amplifier. Amplification may be able to produce intermodulation products, which tend to fill in the notch. The depth of the notch at the output of the amplifier is the measure of the NPR.
In one embodiment, the measurement of NPR may be performed by using an actual wideband transponder model. The test signal may be generated using the AWGN model within 125 MHz bandwidth to represent a multicarrier signal. The test signal may then be passed through filters (bandpass and notch) to shape the test signal to the specific test signal, which is used to excite the actual satellite transponder model.
The BPF model implemented in MATLAB may use the finite impulse response (FIR) implementation approach with a filter order of 120, a passband bandwidth of 125 MHz, and a center at 250 MHz. The notched filter specifications used in the setup of this test may be: (i) notched depth=−60 dB; (ii) notched bandwidth=7 MHz, and (iii) notched frequency=250 MHz. After the notch filter, the shaped AWGN test signal may then be passed through the actual transponder model with transponder noises on, and the output of the transponder may then be analyzed by calculating the power spectral density of the output signal. The simulated NPR test results may then be collected for typical X-band, Ku-band and Ka-band HPAs.
The NPR simulation test may be performed for typical X-band, Ku-band and Ka-band HPAs operating at saturation. With the input power back-off (IPBO) equal to 5 dB, 7 dB, 10 dB, 13 dB, and 15 dB, respectively, the results were captured in Table 1. Referring to Table 1, the NPR results for X, Ku and Ka bands may be about the same, except at IPBO=0 dB, NPR for Ka-band may be worse than X and Ku.
To assess the performance of the linearized transponder model with a PD integrated, and understand the impact of the disclosed linearization method on the practical tactical waveforms, an exemplary simulation set-up is shown in
According to the present disclosure, the NPR using the AWGN noise may be employed as the test signal. Table 1 shows that the NPR at IPBO=0 dB for Ka-band is −17.1 dB. The overall bit signal-to-noise ratio (BSNR) for the system described in Eq. (23) may be as follows
where
is the overall Bit Signal to Noise Ratio (BSNR),
is the downlink BSNR, and
is the carrier-to-intermodulation noise power ratio and is defined as
At the receiver, an imperfect carrier tracking may be assumed. For example, the carrier phase tracking error is a random variable with a Gaussian distribution having a zero mean and a variance characterized by time-product bandwidth δ. Then, the BER performance of quadrature phase shift keying (QPSK) modem may be calculated for the NPR transponder model with the PD.
For example, from the plot of the BER curves for QPSK signal passing through the NPR transponder model with and without a PD, it can be observed that the extended Saleh's PD model may provide about 1 dB BSNR gain at BER=0.001. The theoretical QPSK and imperfect carrier tracking without passing through transponder BER curves may also be provided for reference. It should be noted that because the system is linearized to generate less intermodulation products due to predistortion, −20 dB is adopted in Table 1 for NPR in the case with the PD.
Further, the BER performance for QPSK signal passing through the NPR transponder model may be compared with the PD at Ka-band with temperature as a dynamic parameter. In one embodiment, the BER performance may degrade when the HPA ambient temperature varies from 25° C. to 30° C. since the optimal parameters of the extended Saleh's PD model was derived based on the HPA data at 25° C. These parameters will be optimized for 27° C. and 30° C. by using a machine learning algorithm according to various embodiments of the present disclosure.
Digital predistortion may be a baseband signal processing approach that compensates the power and phase shift caused by the power amplifiers. According to the present disclosure, the spectral regrowth may be used to evaluate the PD performance. The term spectrum regrowth may be first used to describe the intermodulation products that are generated when a digital transmitter is added to an analog transmission system. In one embodiment, the spectral regrowth is mainly caused by the nonlinearity of the HPA.
Further, in one embodiment, as shown in
Referring to
According to the disclosed PD, a curve-fitting algorithm may be adopted to estimate the AM-AM and AM-PM nonlinear distortions. In one embodiment, based on an extended Saleh's model, the measured AM-AM characteristics of the HPA may be fitted by
where ρy(t) is the amplitude of the input complex signal, and a0, b0, α0, and β0 are PD parameters, and the measured AM-PM characteristics of the HPA may be fitted by
where a1, b1, α1, and β1 are PD parameters. When fitting the normalized AM-AM response or the normalized AM-PM response of the HPA, a least-squares criterion may be used to determine the best fit.
Further, the normalized correction signal may be determined using the PD parameters obtained above. In one embodiment, based on the extended Saleh's model, the normalized correction signal may be determined using a0, b0, α0, β0, a1, b1, α1, and β1 obtained above. The AM-AM correction signal and the AM-PM correction signal may be
respectively.
Further, the normalized correction signal may be sent to the HPA through a transmitter that connects to the output terminal of the on-ground physical-model based PD via a control channel. As such, compensation to the AM-AM and AM-PM characteristics of the HPA may be provided and HPA linearization may be achieved.
The present disclosure also provides a machine-learning based PD controller for HPA linearization. The machine-learning based PD controller may be able to promptly determine the PD parameters based on past experience and correspondingly update the PD parameters. Therefore, the disclosed machine-learning based PD controller may provide dynamic and real-time control for optimizing the performance of the PD.
Humans are expert in solving a wide variety of challenging problems, from low-level motor control to high-level cognitive tasks. The disclosed machine-learning based adaptive on-ground PD controller is aimed to create artificial agents that can achieve a similar level of performance and generality. Like a human, the agents of the present disclosure may learn for themselves to achieve successful strategies which lead to desired long-term rewards. This paradigm of learning by trial-and-error, solely from rewards or punishments, is known as reinforcement learning (RL).
Formally, the RL environment can be described as a Markov decision process (MDP), which includes s set of states , plus a distribution of starting states p(s0), a set of action , transition dynamics (st+1|st, at) that map a state-action pair at time t to a distribution of states at time t+1, a reward function r(st, at, st+1), and a discount factor γ∈[0, 1], where smaller values place more emphasis on immediate rewards. According to the disclosed PD controller, when estimating an action-value function, the discount factor γ may be used to reduce the weight of the reward function corresponding to the former action in every two consecutive actions. It is supposed that an agent interacts with an environment , in a sequence of actions, observations, and rewards. At each time-step the agent selects an action at∈, ={1, . . . , K}, the action may be passed to the environment and may modify its internal state and the corresponding reward. The system's internal state may not be observable to the agent most of time, instead various target features of interest from the environment, such as the signal features discussed in the following, may be observed. In addition, a reward r representing the change in the overall system performance may be received.
According to the present disclosure, an RL approach may be leveraged to achieve optimum linearization performance in various environmental conditions.
To specifically solve the PD problem, the state, action, and rewards function may be defined as follows:
Actions: an action may be an operation that the PD takes to adjust (increase or decrease within a certain range) of the 8 parameters of the extended Saleh's model, namely (a0, b0, α0, β0) and (a1, b1, α1, β1). The first four parameters directly correspond to the AM-AM relations, and the AM-PM are characterized based on these 8 parameters together.
States: a state may be an observable (at least partially) status of the PD controller's relation with the environment, and is defined as power and phase of the output power.
Rewards: for given “Input Power” (or Pin) and “Input Phase” (or Phasein), the rewards function may be the distance between estimated and measured output power and phase of the HPA, namely Δp=√{square root over ((Pout
To accurately model a PD in the presence of interference signals or imperfect equipment, the goal of the agent is to interact with the environment by correctly classifying the interference to maximize future rewards. In one embodiment, a standard assumption may be made: the future rewards are discounted by a factor of y per time-step, and the future discounted return at time t may be defined as
where T is the time-step at which the game terminates. The optimal action-value function Q*(s,a) may be defined as the maximum expected return achieved by following a strategy, after seeing some sequence s and then taking some action a,
where π is a policy mapping sequences to actions.
The optimal action-value function may satisfy an important identity known as the Bellman equation,
If the optimal value Q*(s,a) of the sequences s′ at the next time-step is known for the all possible actions a′, then the optimal strategy may be to select the action a′ that maximizes the expected value of r+γQ*(s′,a′). The basic idea behind many RL algorithms is to estimate the action-value function by using the Bellman equation as an iterative update,
Such value iteration algorithms converge to the optimal action-value function, Qi→Q*, as i→∞. In practice, this basic approach may be totally impractical for the action-value function and may be estimated separately for each sequence without any generalization. Instead, a function approximator may be commonly used to estimate the action-value function, Q(s,a;θ)≈Q*(s,a). In the RL, a linear function approximator may be typically used, but sometimes a non-linear function approximator, such as a neural network, may be adopted instead. A Q-network may utilize the neural network function as the value-function approximator. A Q-network can be trained by minimizing a sequence of loss function
L
i(θi)=Es,a˜ρ(⋅)[yi−Q(s,a;θi)2] (26)
where
is the target for iteration i, and ρ(s,a) is a probability distribution over sequences s and a which may be referred to as the behavior distribution. The parameters from the previous iteration θi−1 may be held fixed when optimizing the loss function Li(θi). Differentiating the loss function with respect to the weights, the following gradient can be described as follow,
Rather than computing the full expectation in the above gradient, it is often computationally expedient to optimize the loss function by stochastic gradient descent. The weights may be updated after every time-step, the expectations may be replaced by single samples from the behavior distribution ρ and the environment respectively, and Q-learning may be developed based on the discussion above.
It should be noted that the above algorithm is model-free, and may solve the RL task directly using samples from the environment , without explicitly constructing an estimate of . Also, the algorithm may learn the optimal policy through a greedy search strategy
while following a behavior distribution that ensures adequate exploration of the state space. In practice, the behavior distribution may often be selected by an ϵ-greedy strategy that follows the greedy strategy with probability 1−ϵ and selects a random action with probability ϵ.
In the following, a scenario in which the PD is on the ground with noise/interference (AWGN, constant, and partial time partial band (PTPB)) existing between the PD and the HPA is provided as an example for further illustration. It has been proved through simulation and hardware implementation that the proposed machine learning approach can successfully update the PD coefficients to better adjust to the ambient environment efficiently. The linearization performance and end-to-end BER performance in different scenarios were evaluated with significant improvements comparing to them by solely using a physical-based PD.
In a first case regarding the on-ground PD design, it is supposed that AWGN exists between the PD and the HPA. The noise may be characterized based on Eb/N0, and the parameter set for the PD may be the optimal value for the ambient environment of 25° C. The corresponding BER performance indicates that given the same BER rate, there may be a gap of approximately 2 dB. The proposed machine learning approach can update the PD parameters based on the reward defined above to improve the HPA linearity.
In one embodiment, after being processed using the machine learning model, an updated parameter set, including a0=10.956, b0=0.1930, α0=15.2576, β0=3.4007, a1=0.2261, b1=0.2354, α1=61.4107, and βi=1.0755, may be achieved, and given the bit error rate, the BER performance improvement may be approximately 1 dB.
Moreover, a case in which the PTPB interference exists between the PD and the HPA may also be studied. For example, the PTPB with activate rate 0.2 and 0.9 may heavily affect the HPA linearity. The initial parameter set for the PD may be achieved based on the extended Saleh's model. The proposed machine learning approaches can dynamically update the PD parameter set to adjust to the changing environment.
The updated PD may be able to significantly improve the BER curve which indicates that the proposed approach can improve the system performance in terms of the system linearity. In the design of the machine learning approaches, the operator can also set the weights of the AM-AM and AM-PM in the learning process. The detail of improving the system performance may be referred to the corresponding content described in the embodiments of the disclosed satellite transponder linearization method.
The present disclosure provides a PD controller to implement the functions described above. The PD controller may include a processor, a memory, and a data storage. The PD controller may be arranged in a ground hub which further includes a PD consistent with various embodiments of the present disclosure. The ground hub may be arranged in a SATCOM system together with a transmitter and a satellite transponder, the satellite transponder includes an HPA. The PD controller receives a signal from the satellite transponder. Moreover, the memory of the PD controller may be configured to store computer-executable instructions, and the data storage may be configured to store a database that includes a set of environmental parameters and a set of PD parameters corresponding to the set of environmental parameters. When the computer-executable instructions are executed, the processor of the PD controller is configured to determine a reward function for an action taken by the PD controller based on the received signal and environmental parameters; examine an action-value function for actions taken in a preset past period based on reward functions of the actions; take an action, based on the measured environment parameters and the action-value function, to adjust a plurality of PD parameters for the PD to generate an updated correction signal; and update the database, including the set of environmental parameters and the set of PD parameters, in a real-time manner according to the environment parameters, the plurality of PD parameters, and the action-value function.
The present disclosure provides an HPA linearization method, applied to a ground hub which includes a predistorter (PD) and a PD controller. The PD controller stores a database including a set of environmental parameters and a set of PD parameters corresponding to the set of environmental parameters. The ground hub is arranged in a SATCOM system together with a transmitter and a satellite transponder, and the satellite transponder includes an HPA. The HPA linearization method includes determining an initial correction signal including an AM-AM correction signal and an AM-PM correction signal based on a physical model with a plurality of PD parameters, and sending the initial correction signal to the transmitter to compensate AM-AM and AM-PM characteristics of the HPA; receiving a signal from the satellite transponder; determining a reward function for an action taken by the PD controller based on the received signal and environmental parameters; examining an action-value function for actions taken in a preset past period based on reward functions of the actions; taking an action, based on the environment parameters and the action-value function, to adjust the plurality of PD parameters for the PD to generate an updated correction signal; sending the update correction signal to the transmitter to compensate the AM-AM and AM-PM characteristics of the HPA; and updating the database in a real-time manner according to the environment parameters, the plurality of PD parameters, and the action-value function.
The above detailed descriptions only illustrate certain exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention. Those skilled in the art can understand the specification as whole and technical features in the various embodiments can be combined into other embodiments understandable to those persons of ordinary skill in the art. Any equivalent or modification thereof, without departing from the spirit and principle of the present invention, falls within the true scope of the present invention.
This invention was made with Government support under Contract No. FA9453-18-P-0233, awarded by the United States Air Force Research Laboratory. The U.S. Government has certain rights in this invention.