Data-driven methods for look up table-free closed-loop antenna impedance tuning

Information

  • Patent Grant
  • 11438850
  • Patent Number
    11,438,850
  • Date Filed
    Friday, November 13, 2020
    4 years ago
  • Date Issued
    Tuesday, September 6, 2022
    2 years ago
Abstract
According to an embodiment, a method in a closed-loop antenna impedance tuning (CL-AIT) system is provided. The method includes determining whether a transmitted power is above a pre-determined threshold, when the transmitted power is above the pre-determined threshold, determining a bypass input reflection coefficient, determining whether the bypass input reflection coefficient is greater than a bypass threshold, and when the bypass input reflection coefficient is greater than the bypass threshold, determining an optimal tuner code based on a tuner code search algorithm.
Description
FIELD

The present disclosure is generally related to closed-loop antenna impedance tuning (CL-AIT).


BACKGROUND

For radio antenna transmission systems, an impedance mismatch causes power reflection from the antenna and subsequently degrades the overall transmission efficiency due to the loss in power transferred to the antenna. Therefore, impedance tuning to minimize the impedance mismatch loss plays an important role in mobile devices with the limited power supply. A task of impedance tuning is tantamount to configuring a matching network (or a tuner) by properly adjusting its components including switches, capacitors, and inductors.


SUMMARY

According to one embodiment, a method in a CL-AIT system includes determining whether a transmitted power is above a pre-determined threshold, when the transmitted power is above the pre-determined threshold, determining a bypass input reflection coefficient, determining whether the bypass input reflection coefficient is greater than a bypass threshold, and when the bypass input reflection coefficient is greater than the bypass threshold, determining an optimal tuner code based on a tuner code search algorithm.


According to one embodiment, a CL-AIT system includes a memory and a processor configured to determine whether a transmitted power is above a pre-determined threshold, when the transmitted power is above the pre-determined threshold, determine a bypass input reflection coefficient, determine whether the bypass input reflection coefficient is greater than a bypass threshold, and when the bypass input reflection coefficient is greater than the bypass threshold, determine an optimal tuner code based on a tuner code search algorithm.





BRIEF DESCRIPTION OF THE DRAWINGS

The above and other aspects, features, and advantages of certain embodiments of the present disclosure will be more apparent from the following detailed description, taken in conjunction with the accompanying drawings, in which:



FIG. 1 illustrates a diagram of a CL-AIT system, according to an embodiment;



FIG. 2 illustrates a flowchart showing an operation of a CL-AIT system, according to an embodiment;



FIGS. 3A and 3B illustrate diagrams of transfer function models, according to an embodiment;



FIG. 4 illustrates evolution of features in reinforcement learning (RL) over time, according to an embodiment:



FIG. 5 illustrates a diagram of a CL-AIT system, according to an embodiment;



FIG. 6 illustrates a diagram of a reference tuner model, according to an embodiment;



FIGS. 7A, 7B and 7C illustrate graphs showing performance, according to an embodiment,



FIG. 8 illustrates a flowchart for a method of determining an optimal tuner code, according to an embodiment; and



FIG. 9 illustrates a block diagram of an electronic device in a network environment, according to one embodiment.





DETAILED DESCRIPTION

Hereinafter, embodiments of the present disclosure are described in detail with reference to the accompanying drawings. It should be noted that the same elements will be designated by the same reference numerals although they are shown in different drawings. In the following description, specific details such as detailed configurations and components are merely provided to assist with the overall understanding of the embodiments of the present disclosure. Therefore, it should be apparent to those skilled in the art that various changes and modifications of the embodiments described herein may be made without departing from the scope of the present disclosure. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness. The terms described below are terms defined in consideration of the functions in the present disclosure, and may be different according to users, intentions of the users, or customs. Therefore, the definitions of the terms should be determined based on the contents throughout this specification.


The present disclosure may have various modifications and various embodiments, among which embodiments are described below in detail with reference to the accompanying drawings. However, it should be understood that the present disclosure is not limited to the embodiments, but includes all modifications, equivalents, and alternatives within the scope of the present disclosure.


Although the terms including an ordinal number such as first, second, etc. may be used for describing various elements, the structural elements are not restricted by the terms. The terms are only used to distinguish one element from another element. For example, without departing from the scope of the present disclosure, a first structural element may be referred to as a second structural element. Similarly, the second structural element may also be referred to as the first structural element. As used herein, the term “and/or” includes any and all combinations of one or more associated items.


The terms used herein are merely used to describe various embodiments of the present disclosure but are not intended to limit the present disclosure. Singular forms are intended to include plural forms unless the context clearly indicates otherwise. In the present disclosure, it should be understood that the terms “include” or “have” indicate existence of a feature, a number, a step, an operation, a structural element, parts, or a combination thereof, and do not exclude the existence or probability of the addition of one or more other features, numerals, steps, operations, structural elements, parts, or combinations thereof.


Unless defined differently, all terms used herein have the same meanings as those understood by a person skilled in the art to which the present disclosure belongs. Terms such as those defined in a generally used dictionary are to be interpreted to have the same meanings as the contextual meanings in the relevant field of art, and are not to be interpreted to have ideal or excessively formal meanings unless clearly defined in the present disclosure.


The electronic device according to one embodiment may be one of various types of electronic devices. The electronic devices may include, for example, a portable communication device (e.g., a smart phone), a computer, a portable multimedia device, a portable medical device, a camera, a wearable device, or a home appliance. According to one embodiment of the disclosure, an electronic device is not limited to those described above.


The terms used in the present disclosure are not intended to limit the present disclosure but are intended to include various changes, equivalents, or replacements for a corresponding embodiment. With regard to the descriptions of the accompanying drawings, similar reference numerals may be used to refer to similar or related elements. A singular form of a noun corresponding to an item may include one or more of the things, unless the relevant context clearly indicates otherwise. As used herein, each of such phrases as “A or B,” “at least one of A and B,” “at least one of A or B,” “A, B, or C,” “at least one of A, B, and C,” and “at least one of A, B, or C,” may include all possible combinations of the items enumerated together in a corresponding one of the phrases. As used herein, terms such as “=1st,” “2nd,” “first,” and “second” may be used to distinguish a corresponding component from another component, but are not intended to limit the components in other aspects (e.g., importance or order). It is intended that if an element (e.g., a first element) is referred to, with or without the term “operatively” or “communicatively”, as “coupled with,” “coupled to,” “connected with,” or “connected to” another element (e.g., a second element), it indicates that the element may be coupled with the other element directly (e.g., wired), wirelessly, or via a third element.


As used herein, the term “module” may include a unit implemented in hardware, software, or firmware, and may interchangeably be used with other terms, for example, “logic,” “logic block,” “part,” and “circuitry.” A module may be a single integral component, or a minimum unit or part thereof, adapted to perform one or more functions. For example, according to one embodiment, a module may be implemented in a form of an application-specific integrated circuit (ASIC).


A CL-AIT system with automatically configurable matching networks is allows compensation of body effects such as hand grip causing changes in antenna load, which is critical for the performance of mobile devices with metallic housing, by monitoring changes in antenna impedance and adjusting the tuner configuration accordingly. As opposed to conventional CL-AIT solutions (e.g., based on lookup table (LUT) search), this disclosure provides a data-driven method to configure a tuner for CL-AIT.


Configuring the tuner may be equivalent to finding a tuner code. The optimal tuner code is determined by solving an optimization problem. Two types of cost functions are considered to maximize the instantaneous performance and the asymptotic performance of CL-AIT, respectively. While the former is cost-efficiently tackled by hill-climbing (HC) algorithms, the latter is addressed under the RL framework. In any case, a parametric model is considered for the cost function learned from data. First, disclosed herein is an analytic model to utilize the domain knowledge. Second, disclosed herein is a black box model based on a neural network. Parameters of both models are estimated by solving non-linear least squares problems in offline/online fashion.



FIG. 1 illustrates a diagram of a CL-AIT system, according to an embodiment. The CL-AIT 100 includes an antenna 102, an antenna impedance tuner 104, a tuner control algorithm 106, a feedback receiver 108, a bi-directional coupler 110, and a radio frequency (RF) printed circuit board (PCB) 112 connecting the bi-directional coupler 110 and the antenna impedance tuner 104. The antenna impedance tuner 104 is located between the bi-directional coupler 110 and the antenna 102. The antenna impedance tuner 104 is configured by a tuner code a (typically represented in a binary vector) to set up its switches' state and L/C values. The bi-directional coupler 110 is used to couple transmitted and reflected signals to the feedback path. Then, those captured signals are used to compute the AIT metric, which is an indirect measure of the input reflection coefficient Γin towards the antenna impedance tuner 104, denoted as {hacek over (γ)}in. {hacek over (γ)}bypass denotes γin measured with the default (or bypass) tuner code abypass. According to {hacek over (γ)}bypass, a tuner control algorithm 106 determines the optimal tuner code a* to maximize the power transferred to the antenna 102.



FIG. 2 illustrates a flowchart showing an operation of a CL-AIT system, according to an embodiment. At 202, the system enables the CL-AIT. At 204, the system initializes the CL-AIT. At 206, at time slot t, the system determines whether the transmitted power PTx,t is above the pre-determined threshold ξpow to guarantee the desired signal-to-noise ratio (SNR) of dumped signals and save sufficient power for the antenna to work in arbitrary environment. If the transmitted power PTx,t is above the pre-determined threshold, then at 208, the system enables a feedback receiver, at 210, the system forwards coupler switching, at 212, the system dumps a forward coupled signal, at 214, the system reverses coupler switching, at 216, the system dumps a reverse coupled signal, and at 218, the system disables a feedback receiver.


At 220, the system determines {hacek over (γ)}bypass,t. At 222, the system determines whether |{hacek over (γ)}bypass,t|>ξbypass, where ξbypass is a bypass threshold determined by the target AIT performance. If |{hacek over (γ)}bypass,t|>ξbypass, at 224, the system determines a* from the tuner control algorithm, and at 226, the system sets the tuner code as at=a*. If |{hacek over (γ)}bypass,t|≤ξbypass, at 228, the system sets the tuner code as at=abypass. At 230, the system may perform a CL-AIT performance measure in embodiments utilizing RL, as is described below.


To formulate a problem to find a*, a data-driven cost function needs to be defined by using CL-AIT metric, {hacek over (γ)}in. Γin and Γbypass denote the true reflection coefficients at the tuner input with arbitrary tuner code a and with abypass, respectively. With the known topology of the tuner, a conventional solution may consider a cost based on the analytical expression of Γin with respect to a, Γbypass, and a channel frequency ω. Given Γbypass and ω, a* can be found as a solution of Equation (1):










(

P





1

)











Γ
in



(

a
,

Γ
bypass

,
ω

)








(
1
)







where custom character is a set of available tuner codes. Equivalently, the voltage standing wave ratio (VSWR) can be adopted as the cost of (P1). However, there is a mismatch between γin (or γbypass) and Γin (or Γbypass) due to the transmission line effect over the RF PCB 112 and other source of uncertainties in the forward and backward signals at the bi-directional coupler. Furthermore, the fact that only {hacek over (γ)}in is available in a CL-AIT system justifies the choice of the data-driven cost based on {hacek over (γ)}in, not Γin.



FIGS. 3A and 3B illustrate diagrams of transfer function models for γin, according to an embodiment. The transfer function model 302 may be utilized for analytic modeling or blind modeling, while the transfer function model 304 may be utilized for the analytic model described below. To find the data-driven cost considering the instantaneous performance of CL-AIT, learning of a transfer function of γin is utilized. It is modeled as a function of a, {hacek over (γ)}bypass, and ω with unknown generic parameter vector θ. A model of h(⋅; θ) is either analytically derived by using the domain knowledge of the tuner (e.g., the tuner topology), or blindly modeled (e.g., by using a neural network). As θ is independent of ω, the memory size to store h(⋅; θ) does not grow with the number of supporting channel frequencies.


For the analytic modeling, h(⋅; θ) is modeled to be a composite function of {circumflex over (Γ)}in(⋅; θLC), {circumflex over (Γ)}bypass(⋅; θbypass), and g(⋅; θin), where θLC, θbypass, and θin are model parameters for {circumflex over (Γ)}in, Γbypass, and g, respectively, with θ=[θLCTbypassT, θinT]T, as shown in 304. While {circumflex over (Γ)}in(ω; θLC) with {circumflex over (Γ)}bypass(⋅; θbypass) models Γin at the tuner input, g(⋅; θin) captures the transformation of Γin along the RF PCB section 112, bi-directional coupler 110, and feedback receiver 108. The transformation function g(⋅; θin) is empirically modeled to include scaling, rotation, and translation of {circumflex over (Γ)}in(⋅; θLC). Let αincustom character, ϕincustom character, and bincustom character denote unknown scaling, rotation, and translation parameters, which define θin:=[αin, ϕin, bin]T. Then, γin is modeled to be, as in Equation (2).

γin≈{hacek over (γ)}in=g({circumflex over (Γ)}in(a,{circumflex over (Γ)}bypass({hacek over (γ)}bypassbypass),ω;θLC);θin):=αin exp(in){circumflex over (Γ)}in(a,{circumflex over (Γ)}bypass({hacek over (γ)}bypassbypass),ω;θLC)+bin.  (2)


On the other hand, given the topology of the tuner, {circumflex over (Γ)}in(a,{circumflex over (Γ)}bypass({hacek over (γ)}bypass; θbypass), ω; θLC) is modeled to have the identical form of Γin(a,Γbypass, ω) in Equation (1). The difference in the analytic model is, the L/C component values in Γin(a,Γbypass, ω) are assumed to be unknown model parameters in θLC and estimated from data, instead of considering their nominal values available in the specification of the tuner. This allows capturing of an actual response of the tuner rather than naively relying on its ideal response, and subsequently to address uncertainties in hardware of. Since the form of {circumflex over (Γ)}in(ω; θLC) is tuner-model-specific, the example in consideration of a reference tuner model is described with respect to FIG. 6 will be provided. Similar to the modeling of g(ω; θin), furthermore, it is modeled to be {circumflex over (Γ)}bypass({hacek over (γ)}bypass; θbypass)=αbypass−1e−jϕbypass({hacek over (γ)}bypass−bbypass) with θbypass:=[αbypass, ϕbypass, bbypass]T.


For the blind modeling, θ is simply a neural network parameter, which will be determined by a choice of the neural network structure (e.g., feedforward neural network or convolutional neural network).


For either modeling approach, θ is estimated by solving a non-linear least squares problem. Given a training dataset {{hacek over (γ)}in,n,an,{hacek over (γ)}bypass,n, ωn}n=1N, θ iteratively solves the problem as in Equation (3):












min
θ







C






(
θ
)



=




n
=
1

N








γ



in
,
n


-

h


(


a
n

,


γ



bypass
,
n


,


ω
n

;
θ


)





2



,




(
3
)







which can be tackled via gradient-based algorithms in offline/online fashion. Examples of the gradient-based solvers include (stochastic) gradient descent, Gauss-Newton, Levenberg-Marquardt, and adaptive moment (ADAM) algorithms.


Once the estimate of θ is obtained, denoted as {circumflex over (θ)}, a* can be found by solving Equation (4):











(

P






1



)










h


(

a
,


γ


bypass

,

ω
;

θ
^



)





,




(
4
)







which is found by replacing Γin(a,Γbypass, ω) in Equation (1) with h(a,{hacek over (γ)}bypass, ω;{circumflex over (θ)}). Since custom character is finite, a* can be found in reed manner via exhaustive search. However, it is computational expensive when custom character is large. To cost-effectively determine a* for given {hacek over (γ)}bypass and ω, an HIC algorithm according to Table 1 s disclosed. A threshold ξ* can be pre-determined from the target performance measure for the stopping criterion of the algorithm. Recall that the analytic transfer function model of Γin, which is {circumflex over (Γ)}in({hacek over (γ)}bypass;{circumflex over (θ)}bypass), ω;{circumflex over (θ)}LC), is also readily available as a byproduct of the analytic model of h(ω; θ) (not for the neural network model). Inspired by the conventional cost in Equation (1), then, {circumflex over (Γ)}in(⋅,{circumflex over (Γ)}bypass(⋅;{circumflex over (θ)}bypass),⋅;{circumflex over (θ)}LC) can be used instead of {hacek over (γ)}=h(⋅;{circumflex over (θ)}) for (P1′) in Equation (4), that is, to find a* by solving











Γ
^

in



(

a
,



Γ
^

bypass



(



γ


bypass

;


θ
^

bypass


)


,

ω
;


θ
^

LC



)


.













TABLE 1








Input: ω, {hacek over (γ)}bypass, h(·; {circumflex over (θ)}), Niter, and ξ*



Output: a*



Initialization Step:



Initialize a(0) ∈ custom character  at random



Search Step:



   Set i ← 0



for i ≤ Niter and |h(a(i), {hacek over (γ)}bypass, ω; {circumflex over (θ)})| > ξ* do



      Construct custom character (a(i)) including every a ∈ custom character  with



      XOR(a(i), a) = 1



      
Findacandidate=argminα𝒩(α(i))h(a,(γ)bypass,ω,θ^)




 if |h acandidate, {hacek over (γ)}bypass, ω; {circumflex over (θ)})| ≤ |h(a(i), {hacek over (γ)}bypass, ω; {circumflex over (θ)})|



Update a(i+1) ← acandidate



  else



Break



 end



    Set i ← i + 1



end



Set a* ← a(i)



Return a*









The algorithm in Table 1 has low computational complexity, but guarantees locally optimal a* only. To avoid potentially undesirable local minima, the system may also adopt a random-restart HC (RRHC) algorithm. This is a meta-algorithm built on the standard HC algorithm in that it conducts a series of HC searches via the standard algorithm in Table 1 with randomly initialized a(0) at each attempt, until |h(a*,{hacek over (γ)}bypass, ω;{circumflex over (θ)})|≤ξ*, or the maximum number of restarts Nrestart reached.


Alternatively, an RL algorithm may be utilized that considers the overall flow of CL-AIT operation. With the RL approach, a Markov decision process (MDP) is defined first as a tuple (custom characterss′a,custom characterss′a,β,custom character0), where custom character is a set of states, custom character is a set of actions, custom characterss′a is a transition probability to the next state s′ when action a is taken at state s, custom characterss′a is the stochastic reward function to map the sequence custom characters, a, s′custom character to r∈custom character, β∈(0,1] is the discount factor to balance current and future rewards, and custom character0 is the distribution over initial states s0. In particular, custom characterss′a, and custom characterss′a constitute the model of the MDP. As opposed to conventional machine learning, RL is interaction-based learning.



FIG. 4 illustrates evolution of features in RL over time, according to an embodiment. At each time slot t, an agent takes action at at given state st according to a current policy π:custom charactercustom character. Then, the agent receives a reward rt and the next state st+1, and updates the current policy accordingly. Upon finding the correspondence of key features between CL-AIT and RL as summarized in Table 2, the system checks that the operation of RL is precisely matched with that of CL-AIT in FIG. 2. The block for the CL-AIT performance measure at 230 in FIG. 2 is utilized under the RL framework.









TABLE 2







[Correspondence of features between RL and CL-AIT]








Reinforcement learning
CL-AIT





Agent
Antenna impedance tuner


State s
Reflection coefficient in bypass {hacek over (γ)}bypass and



channel frequency ω


Action a
Tuner code a


Reward r
Performance measure: e.g., VSWR−1, −|{hacek over (γ)}in|,



total radiated power


Policy π
A strategy to find a*










FIG. 5 illustrates a diagram of a CL-AIT system, according to an embodiment. The CL-AIT 500 includes an antenna 502, an antenna impedance tuner 504, a tuner control algorithm 506, a feedback receiver 508, a bi-directional coupler 510, and an RF PCB 512. By assuming the reward as rt=−|{hacek over (γ)}in,t|, a diagram for CL-AIT under RL is shown in FIG. 5. A stochastic gradient descent (SGD) update 514 may be applied to the tuner control algorithm 506, as further described below.


A goal of RL is to learn the optimal policy π* by solving Equation (5):











(

P





2

)








max
π




V
π



(
s
)




,



s

𝒮






(
5
)







where Vπ(s) is so-termed state value function defined as in Equation (6).











V
π



(
s
)


:=


𝔼
[






t
=
0






β
t






s
t



s

t
+
1




π


(

s
t

)







s
0


=
s

]

.





(
6
)







Vπ(s) captures the asymptotic performance of CL-AIT in the long run by considering the expected sum of discounted rewards. (P2) is essentially equivalent to (P1) when β=0. π* satisfies the Bellman optimality, as in Equation (7):











V
*



(
s
)


:=



V

π
*




(
s
)


=


max

a

𝒜










s



𝒮





𝒫

ss


a



[




ss


a

+

β






V
*



(

s


)



]






=

:


Q
*



(

s
,
a

)










s

𝒮









(
7
)







where is Q*(s,a) is the optimal state-action value function. Due to the recursion in V*(s), π* can be found via dynamic programming (DP), if the MDP is known. π can be represented as a LUT of size |custom character|×|custom character| during learning, where the operator |⋅| stands for the cardinality of a set. However, the MDP model is not known for CL-AIT due to unknown custom characterss′a and custom characterss′a. Furthermore, the memory requirement to store π during the learning process is demanding since custom character is continuous and custom character is large. With the aim of scalability, the RL problem is handled via approximate dynamic programming (ADP) to reduce the computational complexity and memory requirement of a solver by dropping dependency with custom character.


The theory of MDPs states that the action at state s from the greedy policy π can be retrieved via Equation (8):











π


(
s
)


=

arg



max

a

𝒜





Q
π



(

s
,
a

)





,



s

𝒮






(
8
)







with the state-action value function Qπ(s,a):=custom character[custom characterss1at=1βtcustom characterstst+1π(st)|s0=s,a0=a]. Therefore, if Q*(s,a) becomes available, a* can be found via Equation (8) without using the explicitly stored π*. However, the expectation involved in Qπ(s,a) cannot be evaluated due to unknown custom characterss′a and custom characterss′a for CL-AIT. To bypass this issue, the system learns an approximate model of Q*(s,a) in sequential manner by using data collected offline or during CL-AIT operation. {circumflex over (Q)}(s,a; θ) denotes an approximate function of Qπ(s,a), which can be modeled via neural networks with parameter vector θ. εt denotes the episode defined as a finite sequence custom characters0, a0, r0, s1, a1, r1, s2, . . . , st, at, rtcustom character with s0˜custom character0 collected up to time slot t. Then, θ can be estimated by solving Equation (9):











(

P





3

)








min
θ



C


(
θ
)




:=


1
t







(


s
τ

,

a
τ


)




t







(



Q
+



(


s
τ

,


a
τ

;

θ

(

t
-
1

)




)


-


Q
^



(


s
τ

,


a
τ

;
θ


)



)

2

.







(
9
)







where








Q
+



(


s
t

,


a
t

;
θ


)


:=


r
t

+

β



max


a



𝒜







Q
^



(


s

t
+
1


,


a


;
θ


)











is a target cost-to-go to approximate unavailable true Qπ(st, at). As εt grows over t, the SGD is adopted to efficiently solve (P3) by minimizing the instantaneous squared error at each time slot t, as in Equation (10):











min
θ




C
t



(
θ
)



:=


(



Q
+



(


s
t

,


a
t

;

θ

(

t
-
1

)




)


-


Q
^



(


s
t

,


a
t

;
θ


)



)

2





(
10
)







where θ(t) is the estimate of θ at time slot t. The minimization problem in Equation (10) is processed at each time slot t to sequentially estimate θ with the reduced computational complexity, rather than solving (P3) in Equation (9) in batch fashion. The update rule for θ at time slot t can be found as in Equation (11):

θ(t)←θ(t−1)−η∇θCt(θ)|θ=θ(t−1)  (11)


with a learning rate η>0. The gradient ∇θCt(θ) can be obtained by using well-known backpropagation.


Nε denotes the number of episodes. Further, the ε-greedy algorithm with {circumflex over (Q)}(s,a; θ) is defined, as in Equation (12):











π
ɛ



(

s
;
θ

)


:=

{





arg



max

a

𝒜





Q
^



(

s
,

a
;
θ


)




,





w
.
p
.




1

-
ɛ







UniformRandom


(
𝒜
)


,




w
.
p
.




ɛ









(
12
)







in order to balance exploration and exploitation during learning process. The RL itself actively chooses a at s from π to collect data and then polish the current π. This interaction is unique in RL compared to other learning methods, where data is passively given to the learner. This leads to the Q-learning algorithm summarized in Table 3.









TABLE 3





Q-learning algorithm for AIT















Input: MDP\{custom characterssα, custom characterssα,}, β, ε, Nε and η


Output: π*


Initialization Step:


Initialize θ(0) at random


Learning Step:


 for i = 1, ... , Nε do


   Initialize s0 ∈ custom character


   Find action a0 = πε (s0; θ(0)) in Equation (12)


   for t = 1, ... , T do


     Take action at, receive rt, and observe the next state st+1





     
Q+(st,at;θ(t-1))rt+βmaxa𝒜Q^(st+1,a;θ(t-1))






     Set Ct(θ) = (Q+(st, at; θ(t−1)) − {circumflex over (Q)}(st, at; θ))2


     Compute ∇θCt(θ) by using backpropagation


     θ(t) ← θ(t−1) − η∇θCt(θ)|θ=θ(t−1)


     Find action at+1 = πε (st+1; θ(t)) in Equation (12)


     t ← t + 1


    end for


    Set θ(0) ← θ(T)


  end for


  Set {circumflex over (θ)} ← θ(T)


   π* ← π greedy w.r.t. {circumflex over (Q)}(s, a; {circumflex over (θ)})


Return π*









Other algorithms may be adopted to get a better descent direction rather than ∇θCt(θ), since the SGD might suffer from slow convergence. Examples of those algorithms include Levenberg-Marquardt (LM) and adaptive moment (ADAM) algorithms. Furthermore, the Q-learning algorithm in Table 3 can be extended to the online setup by setting Nε=1 and T→∞, which does not require offline training. Lastly, the HC algorithms described herein also can be adopted to cast-efficiently perform






arg



max

a

𝒜






Q
^



(

s
,

a
;
θ


)


.







FIG. 6 illustrates a diagram of a reference tuner model 600, according to an embodiment. Synthetic tests were performed to validate the disclosed algorithms. With the characteristic impedance Z0=50 (Ω), ω=900 (MHz), and a∈{0,1}12, RF 1135 in FIG. 6 was considered as a reference tuner model 600 with L1=7.5 (nH), L2=10 (nH), L3=7.5 (nH), L4=4.3 (nH), L5=5.1 (nH), C1(a)=(23×a5+22×a6+21×a7+20×a8+1)Cmin,1, and C2(a)=(23×a9+22×a10+21×a11+20×a12+1)Cmin,2 with Cmin,1=Cmin,2=0.47 (pF).


Given the tuner model, the transfer function of Γin with respect to a, ω, and Γbypass was analytically derived as a generative model as in Equation (13).

















Γ
in



(

a
,

Γ
bypass

,
ω

)


=




Y
in

-
1




(

a
,

Γ
bypass


)


-

Z
0





Y
in

-
1




(

a
,

Γ
bypass


)


+

Z
0




,





(
13
)





where













Y
in



(

a
,

Γ
bypass


)


=


1

j





ω







L
1




(
a
)




+

1

j






ω


(



L
2




(
a
)


+

L
3


)













{

1
-




L
3
2



(



Z
ant



(

Γ
bypass

)


+

j





ω







L
5




(
a
)




)





L
2




(
a
)


+

L
3



×


[








Z
ant



(

Γ
bypass

)





L
4




(
a
)



+

j





ω







L
4




(
a
)





L
5




(
a
)



+



Z
ant



(

Γ
bypass

)





L
5




(
a
)



+










L
3



(



Z
ant



(

Γ
bypass

)


+

j





ω







L
5




(
a
)




)



]


-
1




}


-
1






(
14
)







with L1′(a):=(a1/L1−ω2C1(a))−1. L2′(a):=(1−a2)L2, L4′(a):=(1−a3)L4, and L5′(a):=(a4/L5−ω2C2(a))−1, and











Z
ant



(

Γ
bypass

)


=



1
+

Γ
bypass



1
-

Γ
bypass






Z
0

.






(
15
)







Then, yin was modeled as a scaled and rotated version of Γin, and so does γbypass, that is, γin=α exp(jϕ)Γin(a,Γbypass, ω) with α=0.9 and ϕ#=50, and γbypass={tilde over (α)} exp(j{tilde over (ϕ)})Γbypass with {tilde over (α)}=0.6 and {tilde over (ϕ)}=30. After randomly generating Γbypass, {hacek over (γ)}in and {hacek over (γ)}bypass were found by adding noise to γin and γbypass, respectively. The performance metric was set to Equation (16) with {tilde over (Γ)}bypass({hacek over (γ)}bypass):={tilde over (α)}−1 exp(−j{tilde over (ϕ)}){hacek over (γ)}bypass










VSWR


(

a
,


γ
ˇ

bypass

,
ω

)


:=



1
+




Γ
in



(

a
,



Γ
~

bypass



(


γ
ˇ

bypass

)


,
ω

)






1
-




Γ
in



(

a
,



Γ
~

bypass



(


γ
ˇ

bypass

)


,
ω

)






.





(
16
)







For the algorithms described herein, h(a,{hacek over (γ)}bypass, ω; θ) via analytic modeling particularly considered













Γ
^

in



(

a
,



Γ
^

bypass



(



γ
ˇ

bypass

;

θ
bypass


)


,

ω
;

θ
LC



)


=





Y
^

in

-
1




(

a
,



Γ
^

bypass



(



γ
ˇ

bypass

;

θ
bypass


)


,

ω
;

θ
LC



)


-

Z
0






Y
^

in

-
1




(

a
,



Γ
^

bypass



(



γ
ˇ

bypass

;

θ
bypass


)


,

ω
;

θ
LC



)


+

Z
0




,




(
17
)





where














Y
^

in



(

a
,



Γ
^

bypass



(



γ
ˇ

bypass

;

θ
bypass


)


,

ω
;

θ
LC



)


=


1

j





ω







L
1




(
a
)




+

1

j






ω


(



L
2




(
a
)


+

L
3


)

















{

1
-




L
3
2



(



Z
ant



(



Γ
^

bypass



(



γ
ˇ

bypass

;

θ
bypass


)


)


+

j





ω







L
3




(
a
)




)





L
2




(
a
)


+

L
3



×










[




Z
ant



(



Γ
^

bypass



(



γ
ˇ

bypass

;

θ
bypass


)


)





L
4




(
a
)



+

j





ω







L
4




(
a
)





L
5




(
a
)



+











Z
ant



(



Γ
^

bypass



(



γ
ˇ

bypass

;

θ
bypass


)


)





L
5




(
a
)



+


L
3



(



Z
ant



(



Γ
^

bypass



(



γ
ˇ

bypass

;

θ
bypass


)


)


+

j





ω







L
5




(
a
)




)



]


-
1




}


-
1






(
18
)







with θLC=[L1, L2, L3, L4, L5, Cmin,1, Cmin,2]T and Zant(⋅) in Equation (15). The model of {circumflex over (Γ)}in(a,{circumflex over (Γ)}bypass({hacek over (γ)}bypass; θbypass), ω; θLC) in Equation (18) has the identical form of Γin(a,Γbypass, ω) in Equation (13), but the L/C component values in θLC are estimated rather than considering their nominal values.


h(a,{hacek over (γ)}bypass, ω; θ) via blind modeling and {circumflex over (Q)}(s,a; θ) were modeled by using feedforward neural networks with 3 and 2 hidden layers, respectively, having 30 nodes per layer with rectified linear unit (ReLU) activation functions, while the output layer considered the pure linear activation functions.


After estimating θ via the LM for the analytic model and the ADAM for the neural network (blind) model, respectively, in an offline manner, the CL-AIT algorithms were tested over a set of randomly generated measurements of {hacek over (γ)}bypass. For HC algorithms, it was set to Niter=20 and Nrestart=34. For the Q-learning algorithm, a deterministic reward function is set as custom characterss′a:=VSWR−1(a,{hacek over (γ)}bypass, ω), and subsequently, rt=VSWR−1(at,{hacek over (γ)}bypass,t, ω). Furthermore, β=10−3 was set to mainly focus on the instantaneous reward. For ε-greedy policy, the epsilon-first strategy was particularly considered, that is, as in Equation (19).











π
ɛ



(

s
;
θ

)


:=

{






UniformRandom


(
𝒜
)


,





for





t



ɛ





T








arg



max
a




Q
^



(

s
,

a
;
θ


)




,





for





t

>

ɛ





T





.






(
19
)







With such a policy, only the exploration phase occurs for first εT time slots, and only the exploitation phase follows for remaining (1−ε)T time slots. For simulated tests, it was set to ε=0.4 to sufficiently explore large custom character. Then, a* was found from






arg



max
a




Q
^



(

s
,

a
;
θ


)








via exhaustive search (ES). ES was performed to find







min
a



VSWR


(

a
,


γ
ˇ

bypass

,
ω

)







and a LUT with 24 reference load points were considered as competing alternatives. ES is not practical, but shows the performance limit.



FIGS. 7A, 7B, and 7C illustrate graphs 700, 702 and 706 showing performance, according to an embodiment. FIGS. 7A-7C show the empirical cumulative distribution functions (CDFs) of VSWR after impedance tuning, achieved by the methods and competing alternatives. Both standard HC and RRHC with the analytic model outperformed LUT. In particular, RRHC showed the performance comparable to ES. For the blind model with the neural network, while the standard HC algorithm performed worse than LUT, the RRHC outperformed LUT and was comparable to ES similar to the case of the analytic model. This infers the existence of undesirable local minima, and the random-restart strategy helps the standard HC algorithm to avoid those points. As explained above, the RL problem (P2) in Equation (3) is approximately equivalent to (P1) in Equation (1) by setting β≈0. Therefore, the Q-learning algorithm performed similar to the RRHC, while RL relied only on the data without using any domain knowledge. The performance of the RL approach comes with higher computational complexity than HC algorithms. As |custom character|=212 for the reference tuner 600 in FIG. 6, the complexity is mainly due to finding greedy actions a′ to obtain Q+ and at+1, which complexity is in the order of custom character(|custom character|), at each time slot t. For this preliminary test, the per-time slot elapsed time was 5×10−3 (sec) on average with Nε=30 and T=5×103 for Q-learning in Table 3. One may adopt the HC algorithms to find a′ and at+1 with reduced complexity. A more fundamental solution to jointly handle large state space custom character and action space custom character may be utilized.


The systems and methods disclosed herein outperformed the LUT search for CL-AIT, while addressing practical issues of CL-AIT such as {hacek over (γ)}bypass≠Γbypass (or {hacek over (γ)}in≠Γin) and the imperfectness of hardware with the help of data-driven learning techniques.


The systems and methods use the cost function learned from data, not derived based on ideal response of the tuner, to find the optimal tuner code. The methods are LUT-free methods. As such, the memory requirement does not linearly grow with the number of channel frequencies to support, while a LUT-based solution does. Instead, the memory requirement depends on the size of the model parameter for the cost function, which is independent of the number of channel frequencies. Due to the availability of online algorithms, offline calibration is not necessarily required, and the model can adapt to newly collected data during CL-AIT operation. Conventionally, a cost function is found based on ideal response of the given configuration of the tuner. In practice, however, there is a mismatch between actual and ideal responses of the tuner due to uncertainties in hardware. This eventually degrades the efficacy of such cost. On the other hand, the data-driven cost functions take these uncertainties into account by fitting the cost function model to data collected from a device to operate CL-AIT. As the CL-AIT operates based on the sequential decision making, the reinforcement learning framework fits well on a CL-AIT task.



FIG. 8 illustrates a flowchart 800 for a method of determining an optimal tuner code, according to an embodiment. At 802, the system initializes a CL-AIT system. At 804, the system determines whether a transmitted power is above a pre-determined threshold. At 806, when the transmitted power is above the pre-determined threshold, the system determines a bypass input reflection coefficient. At 808, the system determines whether the bypass input reflection coefficient is greater than a bypass threshold. At 810, when the bypass input reflection coefficient is greater than the bypass threshold, the system determines an optimal tuner code based on a tuner code search algorithm. The tuner code search algorithm may be an HC algorithm or an RL algorithm.



FIG. 9 illustrates a block diagram of an electronic device 901 in a network environment 900, according to one embodiment. Referring to FIG. 9, the electronic device 901 in the network environment 900 may communicate with an electronic device 902 via a first network 998 (e.g., a short-range wireless communication network), or an electronic device 904 or a server 908 via a second network 999 (e.g., a long-range wireless communication network). The electronic device 901 may communicate with the electronic device 904 via the server 908. The electronic device 901 may include a processor 920, a memory 930, an input device 950, a sound output device 955, a display device 960, an audio module 970, a sensor module 976, an interface 977, a haptic module 979, a camera module 980, a power management module 988, a battery 989, a communication module 990, a subscriber identification module (SIM) 996, or an antenna module 997. In one embodiment, at least one (e.g., the display device 960 or the camera module 980) of the components may be omitted from the electronic device 901, or one or more other components may be added to the electronic device 901. In one embodiment, some of the components may be implemented as a single integrated circuit (IC). For example, the sensor module 976 (e.g., a fingerprint sensor, an iris sensor, or an illuminance sensor) may be embedded in the display device 960 (e.g., a display).


The processor 920 may execute, for example, software (e.g., a program 940) to control at least one other component (e.g., a hardware or a software component) of the electronic device 901 coupled with the processor 920, and may perform various data processing or computations. As at least part of the data processing or computations, the processor 920 may load a command or data received from another component (e.g., the sensor module 976 or the communication module 990) in volatile memory 932, process the command or the data stored in the volatile memory 932, and store resulting data in non-volatile memory 934. The processor 920 may include a main processor 921 (e.g., a central processing unit (CPU) or an application processor (AP)), and an auxiliary processor 923 (e.g., a graphics processing unit (GPU), an image signal processor (ISP), a sensor hub processor, or a communication processor (CP)) that is operable independently from, or in conjunction with, the main processor 921. Additionally or alternatively, the auxiliary processor 923 may be adapted to consume less power than the main processor 921, or execute a particular function. The auxiliary processor 923 may be implemented as being separate from, or a part of, the main processor 921.


The auxiliary processor 923 may control at least some of the functions or states related to at least one component (e.g., the display device 960, the sensor module 976, or the communication module 990) among the components of the electronic device 901, instead of the main processor 921 while the main processor 921 is in an inactive (e.g., sleep) state, or together with the main processor 921 while the main processor 921 is in an active state (e.g., executing an application). According to one embodiment, the auxiliary processor 923 (e.g., an image signal processor or a communication processor) may be implemented as part of another component (e.g., the camera module 980 or the communication module 990) functionally related to the auxiliary processor 923.


The memory 930 may store various data used by at least one component (e.g., the processor 920 or the sensor module 976) of the electronic device 901. The various data may include, for example, software (e.g., the program 940) and input data or output data for a command related thereto. The memory 930 may include the volatile memory 932 or the non-volatile memory 934.


The program 940 may be stored in the memory 930 as software, and may include, for example, an operating system (OS) 942, middleware 944, or an application 946.


The input device 950 may receive a command or data to be used by other component (e.g., the processor 920) of the electronic device 901, from the outside (e.g., a user) of the electronic device 901. The input device 950 may include, for example, a microphone, a mouse, or a keyboard.


The sound output device 955 may output sound signals to the outside of the electronic device 901. The sound output device 955 may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as playing multimedia or recording, and the receiver may be used for receiving an incoming call. According to one embodiment, the receiver may be implemented as being separate from, or a part of, the speaker.


The display device 960 may visually provide information to the outside (e.g., a user) of the electronic device 901. The display device 960 may include, for example, a display, a hologram device, or a projector and control circuitry to control a corresponding one of the display, hologram device, and projector. According to one embodiment, the display device 960 may include touch circuitry adapted to detect a touch, or sensor circuitry (e.g., a pressure sensor) adapted to measure the intensity of force incurred by the touch.


The audio module 970 may convert a sound into an electrical signal and vice versa. According to one embodiment, the audio module 970 may obtain the sound via the input device 950, or output the sound via the sound output device 955 or a headphone of an external electronic device 902 directly (e.g., wired) or wirelessly coupled with the electronic device 901.


The sensor module 976 may detect an operational state (e.g., power or temperature) of the electronic device 901 or an environmental state (e.g., a state of a user) external to the electronic device 901, and then generate an electrical signal or data value corresponding to the detected state. The sensor module 976 may include, for example, a gesture sensor, a gyro sensor, an atmospheric pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an infrared (IR) sensor, a biometric sensor, a temperature sensor, a humidity sensor, or an illuminance sensor.


The interface 977 may support one or more specified protocols to be used for the electronic device 901 to be coupled with the external electronic device 902 directly (e.g., wired) or wirelessly. According to one embodiment, the interface 977 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, a secure digital (SD) card interface, or an audio interface.


A connecting terminal 978 may include a connector via which the electronic device 901 may be physically connected with the external electronic device 902. According to one embodiment, the connecting terminal 978 may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (e.g., a headphone connector).


The haptic module 979 may convert an electrical signal into a mechanical stimulus (e.g., a vibration or a movement) or an electrical stimulus which may be recognized by a user via tactile sensation or kinesthetic sensation. According to one embodiment, the haptic module 979 may include, for example, a motor, a piezoelectric element, or an electrical stimulator.


The camera module 980 may capture a still image or moving images. According to one embodiment, the camera module 980 may include one or more lenses, image sensors, image signal processors, or flashes.


The power management module 988 may manage power supplied to the electronic device 901. The power management module 988 may be implemented as at least part of, for example, a power management integrated circuit (PMIC).


The battery 989 may supply power to at least one component of the electronic device 901. According to one embodiment, the battery 989 may include, for example, a primary cell which is not rechargeable, a secondary cell which is rechargeable, or a fuel cell.


The communication module 990 may support establishing a direct (e.g., wired) communication channel or a wireless communication channel between the electronic device 901 and the external electronic device (e.g., the electronic device 902, the electronic device 904, or the server 908) and performing communication via the established communication channel. The communication module 990 may include one or more communication processors that are operable independently from the processor 920 (e.g., the AP) and supports a direct (e.g., wired) communication or a wireless communication. According to one embodiment, the communication module 990 may include a wireless communication module 992 (e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 994 (e.g., a local area network (LAN) communication module or a power line communication (PLC) module). A corresponding one of these communication modules may communicate with the external electronic device via the first network 998 (e.g., a short-range communication network, such as Bluetooth™, wireless-fidelity (Wi-Fi) direct, or a standard of the Infrared Data Association (IrDA)) or the second network 999 (e.g., a long-range communication network, such as a cellular network, the Internet, or a computer network (e.g., LAN or wide area network (WAN)). These various types of communication modules may be implemented as a single component (e.g., a single IC), or may be implemented as multiple components (e.g., multiple ICs) that are separate from each other. The wireless communication module 992 may identify and authenticate the electronic device 901 in a communication network, such as the first network 998 or the second network 999, using subscriber information (e.g., international mobile subscriber identity (IMSI)) stored in the subscriber identification module 996.


The antenna module 997 may transmit or receive a signal or power to or from the outside (e.g., the external electronic device) of the electronic device 901. According to one embodiment, the antenna module 997 may include one or more antennas, and, therefrom, at least one antenna appropriate for a communication scheme used in the communication network, such as the first network 998 or the second network 999, may be selected, for example, by the communication module 990 (e.g., the wireless communication module 992). The signal or the power may then be transmitted or received between the communication module 990 and the external electronic device via the selected at least one antenna.


At least some of the above-described components may be mutually coupled and communicate signals (e.g., commands or data) therebetween via an inter-peripheral communication scheme (e.g., a bus, a general purpose input and output (GPIO), a serial peripheral interface (SPI), or a mobile industry processor interface (MIPI)).


According to one embodiment, commands or data may be transmitted or received between the electronic device 901 and the external electronic device 904 via the server 908 coupled with the second network 999. Each of the electronic devices 902 and 904 may be a device of a same type as, or a different type, from the electronic device 901. All or some of operations to be executed at the electronic device 901 may be executed at one or more of the external electronic devices 902, 904, or 908. For example, if the electronic device 901 should perform a function or a service automatically, or in response to a request from a user or another device, the electronic device 901, instead of, or in addition to, executing the function or the service, may request the one or more external electronic devices to perform at least part of the function or the service. The one or more external electronic devices receiving the request may perform the at least part of the function or the service requested, or an additional function or an additional service related to the request, and transfer an outcome of the performing to the electronic device 901. The electronic device 901 may provide the outcome, with or without further processing of the outcome, as at least part of a reply to the request. To that end, a cloud computing, distributed computing, or client-server computing technology may be used, for example.


One embodiment may be implemented as software (e.g., the program 940) including one or more instructions that are stored in a storage medium (e.g., internal memory 936 or external memory 938) that is readable by a machine (e.g., the electronic device 901). For example, a processor of the electronic device 901 may invoke at least one of the one or more instructions stored in the storage medium, and execute it, with or without using one or more other components under the control of the processor. Thus, a machine may be operated to perform at least one function according to the at least one instruction invoked. The one or more instructions may include code generated by a complier or code executable by an interpreter. A machine-readable storage medium may be provided in the form of a non-transitory storage medium. The term “non-transitory” indicates that the storage medium is a tangible device, and does not include a signal (e.g., an electromagnetic wave), but this term does not differentiate between where data is semi-permanently stored in the storage medium and where the data is temporarily stored in the storage medium.


According to one embodiment, a method of the disclosure may be included and provided in a computer program product. The computer program product may be traded as a product between a seller and a buyer. The computer program product may be distributed in the form of a machine-readable storage medium (e.g., a compact disc read only memory (CD-ROM)), or be distributed (e.g., downloaded or uploaded) online via an application store (e.g., Play Store™), or between two user devices (e.g., smart phones) directly. If distributed online, at least part of the computer program product may be temporarily generated or at least temporarily stored in the machine-readable storage medium, such as memory of the manufacturer's server, a server of the application store, or a relay server.


According to one embodiment, each component (e.g., a module or a program) of the above-described components may include a single entity or multiple entities. One or more of the above-described components may be omitted, or one or more other components may be added. Alternatively or additionally, a plurality of components (e.g., modules or programs) may be integrated into a single component. In this case, the integrated component may still perform one or more functions of each of the plurality of components in the same or similar manner as they are performed by a corresponding one of the plurality of components before the integration. Operations performed by the module, the program, or another component may be carried out sequentially, in parallel, repeatedly, or heuristically, or one or more of the operations may be executed in a different order or omitted, or one or more other operations may be added.


Although certain embodiments of the present disclosure have been described in the detailed description of the present disclosure, the present disclosure may be modified in various forms without departing from the scope of the present disclosure. Thus, the scope of the present disclosure shall not be determined merely based on the described embodiments, but rather determined based on the accompanying claims and equivalents thereto.

Claims
  • 1. A method in a closed-loop antenna impedance tuning (CL-AIT) system, comprising: determining whether a transmitted power is above a pre-determined threshold;when the transmitted power is above the pre-determined threshold, determining a bypass input reflection coefficient;determining whether the bypass input reflection coefficient is greater than a bypass threshold by directly comparing the bypass input reflection coefficient and the bypass threshold; andwhen the bypass input reflection coefficient is greater than the bypass threshold, determining an optimal tuner code based on a tuner code search algorithm.
  • 2. The method of claim 1, wherein the tuner search code algorithm comprises a hill-climbing (HC) algorithm.
  • 3. The method of claim 2, further comprising determining a transfer function model.
  • 4. The method of claim 3, wherein the transfer function model is determined analytically based on tuner topology.
  • 5. The method of claim 3, wherein the transfer function model is determined blindly based on at least one neural network.
  • 6. The method of claim 1, wherein the tuner search code algorithm comprises a reinforcement learning (RL) algorithm.
  • 7. The method of claim 6, further comprising performing a CL-AIT performance measurement.
  • 8. The method of claim 6, the RL algorithm comprises learning an optimal policy to determine the optimal tuner code.
  • 9. The method of claim 8, wherein the RL algorithm comprises determining an optimal state-action value function.
  • 10. The method of claim 6, wherein the RL algorithm utilizes approximate dynamic programming (ADP).
  • 11. A closed-loop antenna impedance tuning (CL-AIT) system, comprising: a memory; anda processor configured to: determine whether a transmitted power is above a pre-determined threshold;when the transmitted power is above the pre-determined threshold, determine a bypass input reflection coefficient;determine whether the bypass input reflection coefficient is greater than a bypass threshold by directly comparing the bypass input reflection coefficient and the bypass threshold; andwhen the bypass input reflection coefficient is greater than the bypass threshold, determine an optimal tuner code based on a tuner code search algorithm.
  • 12. The system of claim 11, wherein the tuner search code algorithm comprises a hill-climbing (HC) algorithm.
  • 13. The system of claim 12, wherein the processor is further configured to determine a transfer function model.
  • 14. The system of claim 13, the transfer function model is determined analytically based on tuner topology.
  • 15. The system of claim 13, wherein the transfer function model is determined blindly based on at least one neural network.
  • 16. The system of claim 11, wherein the tuner search code algorithm comprises a reinforcement learning (RL) algorithm.
  • 17. The system of claim 16, wherein the processor is further configured to perform a CL-AIT performance measurement.
  • 18. The system of claim 16, the RL algorithm comprises learning an optimal policy to determine the optimal tuner code.
  • 19. The system of claim 18, wherein the RL algorithm comprises determining an optimal state-action value function.
  • 20. The system of claim 16, wherein the RL algorithm utilizes approximate dynamic programming (ADP).
PRIORITY

This application is based on and claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application filed on Sep. 9, 2020 and assigned Ser. No. 63/075,987, and to U.S. Provisional Patent Application filed on Sep. 16, 2020 and assigned Ser. No. 63/079,080, the entire contents of which are incorporated herein by reference.

US Referenced Citations (20)
Number Name Date Kind
6118409 Pietsch Sep 2000 A
9031523 Anderson May 2015 B2
9332471 Huang et al. May 2016 B2
9401738 Wang et al. Jul 2016 B2
10410834 Ryu Sep 2019 B1
20030076168 Forrester Apr 2003 A1
20050093624 Forrester May 2005 A1
20070093282 Chang Apr 2007 A1
20120038524 Song Feb 2012 A1
20120154070 Camp, Jr. Jun 2012 A1
20130052967 Black Feb 2013 A1
20140091968 Harel Apr 2014 A1
20140175896 Suzuki Jun 2014 A1
20140222997 Mermoud et al. Aug 2014 A1
20150195126 Vasseur et al. Jul 2015 A1
20150236728 Suzuki Aug 2015 A1
20170221032 Mazed Aug 2017 A1
20170346178 Shi Nov 2017 A1
20200144973 Gunzner May 2020 A1
20210218430 Han Jul 2021 A1
Foreign Referenced Citations (2)
Number Date Country
110147590 Aug 2019 CN
110365425 Oct 2019 CN
Non-Patent Literature Citations (6)
Entry
Q. Gu et al., “A new method for matching network adaptive control,” IEEE Trans. Microw. Thoery Techn., vol. 61, No. 1, pp. 587-595, 2013.
Y. Li, et al., “An automatic impedance matching method based on the feedforward-backpropagation neural network for a WPT System,” IEEE Trans. Ind. Electron., vol. 66, No. 5, pp. 3963-3972, 2019.
K. Levenberg, “A method for the solution of certain non-linear problems in least squares,” Q. Appl. Math., vol. 2, No. 2, pp. 164-168, 1944.
D. P. Kingma et al., “ADAM: A method for stochastic optimization,” in Int. Conf. Learning Representations (ICLR), San Diego, CA, 2015, pp. 15.
R. S. Sutton et al., Reinforcement Learning: An Introduction, Cambridge, MA: The MIT Press, 2017, pp. 445.
D. E. Rumelhart et al., Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations Cambridge, MA, USA: The MIT Press, 1986, pp. 318-362.
Related Publications (1)
Number Date Country
20220078722 A1 Mar 2022 US
Provisional Applications (2)
Number Date Country
63079080 Sep 2020 US
63075987 Sep 2020 US