This application is related to U.S. patent application Ser. No. 14/285,811, “Automatic Train Stop Control System,” filed on May 23, 2014 by Di Cairano et al., incorporated herein by reference. There, a train is stopped at a predetermined position by constraining a velocity of the train to form a feasible area for a state of the train during movement.
This invention relates generally stopping a train automatically at a predetermined range of positions, and more particularly to dual control where an identification and a control of an uncertain system is performed concurrently.
A Train Automatic Stopping Controller (TASC) is an integral part of an Automatic Train Operation (ATO) system. The TASC performs automatic braking to stop a train at a predetermined range of positions. ATO systems are of particularly importance for train systems where train doors need to be aligned with platform doors, see the related Application, and Di Cairano et al., “Soft-landing control by control invariance and receding horizon control,” American Control Conference (ACC), pp. 784-789, 2014.
However, the transient performance of the train, i.e., the trajectory to the predetermined position, can be adversely affected by uncertainties in dynamic constraints used to model the train. These uncertainties can be attributed to the train mass, brake actuators time constants, and track friction. In many applications, estimating the uncertainties ahead of time (offline) is not possible due to numerous factors, such as expensive operational downtime, the time-consuming nature of the task, and the fact that certain parameters, such as mass and track friction, vary during operation of the train.
Therefore, the parameter estimation should be performed online (in real-time) and in a closed-loop, that is, while the ATO system operates. Major challenges for closed-loop estimation of dynamic systems include conflicting objectives of the control problem versus the parameter estimation, also called identification or learning, problem.
The control objective is to regulate a dynamic system behavior by rejecting the input and output disturbances, and to satisfy the dynamic system constraints. The identification objective is to determine the actual value of the dynamic system parameters, which is performed by comparing the actual behavior with the expected behavior of the dynamic system. That amounts to analyze how the system reacts to the disturbances.
Hence, the action of the control that cancels the effects of the disturbances makes the identification more difficult. On the other hand, letting the disturbances act uncontrolled to excite the dynamic system, which improve parameters estimation, makes a subsequent application of the control more difficult, because the disturbances may have significantly changed the behavior of the system from the desired behavior, and recovery may be impossible.
For instance, the TASC may compensate for the uncertain parameters such as friction and mass by actions of traction and brakes, so that the train stops precisely at the desired location regardless of the correct estimation of the train parameter. Thus, the dynamic system representing the train behaves closely to what expected and the estimation algorithm does not see major difference between the desired behavior and the actual behavior of the train. Hence, it is difficult for the estimation algorithm to estimate the unknown parameters. On the other hand even if the train behavior is close to the desired and the expected behaviors, this may be achieved by a large action of the TASC on brakes and traction, which results in unnecessary energy consumption, and jerk, which compromise ride quality.
On the other hand, letting the train dynamic system operate without control for some time may result in differences between the expected and actual behavior with subsequent good estimation, but when the control is re-engaged the train behavior may be too far from the desired one for the latter to be recovered, or it may cost an excessive amount of energy and jerk to recover.
Finally, in general there is no guarantee that the external disturbances cause enough effect on the train behavior to allow for correct estimation of the parameters, due to their random and uncontrolled nature. That is, it is not guaranteed that the external disturbances persistently excite the train system.
Therefore, it is desired to precisely stop the train within a predetermined range of positions, while estimating the actual train systems parameters to improve performance metrics, such as minimal jerk, energy, or time, by continuously updating the model in real-time. To this end, a system and method is needed for combined estimation and control that achieves:
To assure system parameters estimation, constraint satisfaction, and performance optimization, a model predictive control (MPC) with dual objective can be designed, see the related application Ser. No. 14/285,811, Genceli et al., “New approach to constrained predictive control with simultaneous model identification,” AIChE Journal, vol. 42, no. 10, pp. 2857-2868, 1996, Marafioti et al., “Persistently exciting model predictive control using FIR models,” International Conference Cybernetics and Informatics, no. 2009, pp. 1-10, 2010, Rathousk{grave over (y)} et al., “MPC-based approximate dual controller by information matrix maximization,” International Journal of Adaptive Control and Signal Processing, vol. 27, no. 11, pp. 974-999, 2013, Heirung et al., “An MPC approach to dual control,” 10th International Symposium on Dynamics and Control of Process Systems (DYCOPS), 2013, Heirung et al., “An adaptive model predictive dual controller,” Adaptation and Learning in Control and Signal Processing, vol. 11, no. 1, pp. 62-67, 2013, and Weiss et al., “Robust dual control MPC with guaranteed constraint satisfaction,” Proceedings of IEEE Conference on Decision and Control, Los Angeles, Calif., December 2014.
In part, the performance of the parameter estimation depends on whether the effect of external actions on the system is sufficiently visible, that is if the system is persistently excited and sufficient information is measured. Thus, for obtaining fast estimation of the system parameters, the action of the dual MPC is selected to trade off the system excitation and control objective optimization. To achieve such desired tradeoff between regulation and identification, an optimization cost function J can be expressed as
J=Jc+γψ(U), (1)
where J is a linear combination of the control-oriented cost Jc, ψ(U) is the residual uncertainty (or conversely the gained information) due to applying a sequence of inputs U, and γ is a weighting function of an estimation error that trades off between control and learning objectives. Optimizing cost function (1) subject to system constraints results in an active learning method in which the controller generates inputs to regulate the system, while exciting the system to measure information required for estimating the system parameters.
The weighting function should favor learning over regulation when the estimated value of the unknown parameters is unreliable. As more information is obtained and the estimated value of the unknown parameters becomes reliable, control should be favored over learning, by decreasing the value of function γ.
Possible definitions ψ(U), i.e., include
ψ(U)=Ei=1Γtrace(Pi), (2a)
ψ(U)=−log det(RΓ), (2b)
ψ(U)=λmin(RΓ−R0), and (2c)
ψ(U)=Σi=1vexp(−Rii), (2d)
where P is unknown parameters covariance matrix, trace returns the sum of the elements on the main diagonal of P, R is an unknown parameters information matrix (R=P−1), Γ is a learning time horizon, v is the number of unknown parameters, and det and exp represent the determinant and exponent, respectively.
Unfortunately, all measures in (2a-2d) are non-convex in the decision variable U. This turns a conventional convex control problem into a non-convex nonlinear programming problem for which convergence to a global optimum cannot be guaranteed. Furthermore, the weighting function γ has a significant effect on the control input U. It is known that the reference generation problem can be converted to a convex problem. For example, Rathousk{grave over (y)} et al., use an approach based on conducting the reference generation optimization over a Γ-step learning time horizon, which includes Γ-1 previous input steps, and uses only a single step in the future.
Heirung et. Al., “An adaptive model predictive dual controller,” use Σi=1vexp(−Rii) as a measure of information about the system parameters. That function is used to augment the model predictive cost function. However, to avoid the problems introduced by the non-convexity of that information measure, the minimization of the term is considered over a 1-step learning time horizon. That method also provides the necessary condition for the weighting parameter γ to guarantee that the generated reference provides sufficient excitation to learn system parameters. The application of 1-step learning time horizon prevents optimization of the overall system performance, which requires in general a longer time horizon.
Another method provides an approximate solution for simultaneous estimation and control, based on dynamic programming for static linear systems with a quadratic cost function, see Lobo et al., “Policies for simultaneous estimation and optimization,” Proceedings of the American Control Conference, June 1999. While the approximate solution can improve the system performance, it cannot be easily applied to dynamic systems, such as ATO systems, and it requires significant computations, which may be too slow or may require too expensive hardware to be executed in ATO.
The embodiments of the invention provide a system and method for stopping a train at a predetermined position while optimizing certain performance metrics, which require the estimation of the train parameters. The method uses dual control where an identification and control of an uncertain system are performed concurrently.
The method uses a control invariant set to enforce soft landing constraints, and a constrained recursive least squares procedure to estimate the unknown parameters.
An excitation input sequence reference generator generates a reference input sequence that is repeatedly determined to provide the system with sufficient excitation, and thus to improve the estimation of the unknown parameters. The excitation input sequence reference generator computes the reference input sequence by solving a sequence of convex problems that relax a single non-convex problem.
The selection of the command input that optimizes the system performance is performed by solving a constrained finite time horizon optimal control problem with a time horizon greater than 1, where the constraints include the control invariant set constraints. To ensure convergence of the parameter estimates of the unknown parameter, we include an additional term in the cost function of the finite time horizon optimal control problem accounting for the difference between the command input sequence and the reference input sequence.
The finite time horizon optimal control problem is solved in a model predictive control (MPC). Thus, MPC uses the excitation input sequence and current estimates of unknown parameters to determine the system input u(k), command input, which results, for instance, in commands to train traction and brake. Due to the additional term in the cost function minimizing the deviation of the input from the excitation input, the MPC provides the required excitation for improving parameter estimation.
After the input is applied to uncertain train dynamics, the train state information and input information are used in a parameter estimator to update the estimates of the unknown parameters.
As shown in
In the conventional single-step formulation as described in the background section, the learning and control objectives are combined to form an augmented optimization problem, such as the optimization cost function in equation (1).
In the two-step formulation according to embodiments of the invention, the problem of generating the excitation input 202 is solved first. This is followed by the solving the control problem in the controller 215, which is modified to account for the solution of the excitation input generation problem.
Description of the Uncertain Train Dynamics
This invention addresses uncertain train systems that can be represented as a disturbed polytopic linear difference inclusion (dpLDI) system.
The model of the dynamics of the train is
x(k+1)=Arx(k)+Bru(k)+Bww, (3)
where xεRn
As shown in
The details of the procedure for expressing an uncertain system in the form of equation (7) below is described in the related Application, i.e.,
Ar=Σi=1lθiAi,Br=Σi=1lθiBi,wr=Σi=1pηiwi, (4)
where θi are coefficients of a convex combinations and represents the unknown parameters for the system dynamics, and ηi are coefficients of a convex combinations and the unknown parameters for the disturbance vector and satisfy
Σi=1lθi=1,θi≧0,Σi=1lηi=1,ηi≧0.
Because the value of the parameters θi, ηi is unknown, an estimate of the model is used
x(k+1)=Âx(k)+{circumflex over (B)}u(k)+Bwŵ, (5)
Â=Σi=1l{circumflex over (θ)}iAi,{circumflex over (B)}=Σi=1l{circumflex over (θ)}iBi,ŵ=Σi=1p{circumflex over (η)}iwi, (6a)
Σi=1lθi=1,θi≧0,Σi=1lηi=1,ηi≧0 (6b)
where {circumflex over (θ)}i are estimates of the unknown parameters for the system dynamics, and {circumflex over (η)}i are estimates of the unknown parameters for the disturbance vector.
The estimate of the parameters, and hence the estimate of the model, changes as the estimation algorithm obtains more information about the operation of the train.
System Constraints and Soft-Landing Cone
TASC may need to enforce a number of constraints on the train operations. These include maximal and minimal velocity and acceleration, ranges for the forces in the actuators, etc. A particular set of constraints is the soft-landing cone.
The soft-landing cone for the TASC problem is a set of constraints defining allowed train positions-train velocity combinations that, if always enforced, guarantees that the train will stop in the desired ranges of positions εtgt. The soft-landing cone for TASC problem and the computation of the control invariant set under uncertain train parameters is described in the related Application.
Hx∞x+Hu∞u≦k∞. (7)
The constraints of the control invariant sets are such that if the constraints are satisfied, the train operating constraints and the soft landing cone constraints are satisfied. Furthermore, TASC can always find a selection of the braking and traction controls that satisfies the control invariant set constraints, hence stopping occurs precisely in the desired range of position. In certain embodiments of this invention the constraints in equation (7) may also include additional constraints on the operation of the train.
Two-Steps Dual Control MPC for Train Automated Stopping Control
The reference generator determines a sequence of excitation inputs (Uexc) 202. The controller 215 receives the uncertain model 204, the estimate of the unknown parameters 201, the state 206, the constraints 203, for instance in the form described by equation (7). The controller 215 also receives the sequence of excitation inputs 202, a control-oriented cost function 210, and a parameter estimate reliability 212 produced by the parameter estimator 213, and produces a command input u 211 for the train that represents the action to be applied to the traction-brake actuator 220.
The command input 211 is also provided to the parameter estimation 213 that uses the command input, together with the state 206 to compare the expected movement of the train, resulting in an expected future state of the train. The parameter estimator compares the expected future state of the train with the state of the train 206 at a future time to adjust the estimate of the unknown parameters.
The current estimate of the train model 302, the current cost function 312, the current state 206 and the constraints 321 from 203 are used in the command computation 331 to obtain a sequence of future train command inputs. The command selection 341 selects the first in time element of the future sequence of commands as the train command input 211.
First, from the state 206 and previously predicted future state, based on past state past parameter estimate and command input 211, the parameter estimate 201 is updated 401, and a parameter estimate reliability 212 is produced.
Then in block 402, using the parameter estimate 201 and the uncertain model 204 a sequence of excitation inputs 202 is generated.
Then in block 403, using the sequence of excitation inputs 202, the uncertain model 204, parameter estimate 201, the parameter estimate reliability state 206, the control-oriented cost function 210, the constraints 203, and the state 206, a control problem is built.
Finally, control problem is solved, and the command input 211 is determined 404 and applied to the traction-brake actuator 220. The cycle is repeated when a new value for the state 206 is available.
The method steps described herein can be performed in a microprocessor, field programmable array, digital signal processor or custom hardware.
Parameter Estimator
As shown in
where k is an index of the time step, the regressor matrix M is
Mk=[A1x(k)+B1u(k), . . . ,Alx(k)+Blu(k),Bww1, . . . ,Bwwp]T (10)
T denotes the transpose, and
θ(k+1)=[θ1(k+1) . . . θl(k+1)η1(k+1) . . . ηp(k+1)]T is the parameter vector.
Then, we update 502 the estimate of the estimation covariance and precision by
where α is a positive filtering constant related to how much the estimate of the unknown parameters should rely on previous estimated values, and it is lower when less reliance on older estimates is desired.
Due to the presence of constraints (6b), a constrained optimization problem is solved to compute the updated estimate of the unknown parameters 503 as
where {circumflex over (θ)}(k+1)=[{circumflex over (θ)}1(k+1) . . . {circumflex over (θ)}l(k+1){circumflex over (η)}1(k+1) . . . {circumflex over (η)}p(k+1)]T is the updated estimate of the unknown parameters.
Together with the estimate of the unknown parameters, a reliability of the estimate γ is computed 504 that is a nonnegative value that is smaller the more the estimate of the unknown parameters is considered reliable, where 0 means that the estimate of the unknown parameters is certainly equal to the correct value of the parameters. In some embodiments of this invention, the estimate reliability is computed as
γ(k+1)=∥v(k+1)−MT(k){circumflex over (θ)}(k+1)∥2 (11a)
or alternatively as
γ(k+1)=det(P(k+1)) (11b)
or
γ(k+1)=trace(P(k+1)) (11c)
Excitation Input Sequence Reference Generator
We quantify a reduction of uncertainty due to an input sequence in terms of the predicted persistence of excitation measured through a change in the information matrix minimal eigenvalue over the learning time horizon Γ
ψ(U)=−λmin(RΓ−R0). (12)
Equation (12) is used as an optimization objective function in computing the sequence excitation inputs.
The estimates of the unknown parameters converge to their true values when the condition Δmin(RΓ−R0)>0 is satisfied for a learning time horizon ΓεZ+ where Z+ is the set of positive integers. The information matrix R is
Ri=αiR0+Σj=0i−1αjMi−j−1Mi−j−1T. (13)
The reference generator 205 determines the excitation input 202 by solving
where the excitation input sequence is
Uexc(k)=[uexc,1T,uexc,2T, . . . ,uexc,ΓT]T.
Considering the train dynamics and the invariant set constraints (14), based on soft landing cone, ensures that the excitation input is feasible.
Because equation (8) is non-convex in U, solving an optimization problem involving (8) directly requires significant amount of computation and may even be impossible during actual train operation.
Thus, it is a realization of this invention that indices of the information matrix Rij can be expressed as quadratic functions of the command input,
[R]ij=UTQijU+fijT+cij=trace(QijUUT)+fijTU+cij. (15)
It is another realization of this invention that by substituting UUT in equation (15) with a new variable Ũ, and enforcing
to be a rank-1 positive semi-definite matrix, thus reformulating equation (14) as
where the inequality constraint AU−b≦0 consolidates constraints xi+1=Âkxi+{circumflex over (B)}kuexc,i and Hx∞x0+Hu∞uexc,0≦Ku∞ of (14) into a single group of constraints.
In equation (13), the only constraint that makes the problem difficult to solve is the constraint on the rank of the matrix V, which is the rank-1 constraint rank(V)=1, However, it is realized that such constraint can be enforced indirectly by an iterative in inner-loop outer-loop decomposition. In particular, the outer-loop performs a scalar bisection search, and the inner-loop solves a relaxed problem with the constraint on the rank of the matrix by solving a sequence of weighted nuclear norm optimization problems using a current value of a bisection parameter from the outer-loop.
In this method, parameters δ1, δ2εR+, and hmaxεZ+ are used to determine the desired accuracy of the results, i.e., the smaller δ1, δ2εR+ and the higher accuracy hmaxεZ+.
which is a relaxed version of (15) where the rank-1 constraint is removed.
Based on the solution of (16), we initialize the variables
Here, ρmin and ρmax represents lower and upper bound on λmin(RΓ−R0). Then, in block 602, if the lower and upper bound of λmin(RΓ−R0) eigenvalue satisfy the termination condition (ρmax−ρmin)/ρmax≦δ1, we set Uexc=[uexc,0T . . . uexc,0T]T=U*. Instead if (ρmax=ρmin)/ρmax>δ1 we iterate the following operations.
First in block 603, we update the outer-loop variable ρf and initialize the variables of the inner-loop
ρf←0.5(ρmin+ρmax),W(0)←I,h←0, (18)
Then, in block 604, we solve
which is a convex optimization problem consisting with the weighted minimization of the nuclear norm. Based on the solution of (17), in block 605 we update
We continue solving (19) and updating by (20) until (block 606) either σ2(V(h−1))≦δ2σ1(V(h−1)), where σi(V) denotes the ith singular value of V, or h=hmax or (17) is infeasible, which terminates the inner-loop
We update in block 607 the upper and lower bounds based on the different cases for the subsequent outer-loop update. In the first case, we have found a rank 1 solution, and we set ρmin←Pf, while in the second and third case we have not found a solution, and hence we set ρmax←ρf.
Controller
Shown in
First, in block 701 from the current estimate of the unknown parameter obtained from 401 {circumflex over (θ)}(k)=[{circumflex over (θ)}1(k) . . . {circumflex over (θ)}l(k){circumflex over (η)}1(k) . . . {circumflex over (η)}p(k)]T a current estimate of the train dynamics 302 is obtained as
Next, in block 702 from the excitation input sequence Uexc(k) computed from 402, from the reliability of the estimate γ(k) computed from 401, and from an control-oriented cost function Jc such as
where Pcost, Qcost, Rcost are weighting matrices N is a prediction time horizon and i is the prediction index, a cost function is constructed as
which includes the control-objective Jc and an additional learning-objective of applying a command close to the one obtained by the excitation input sequence reference generator. The learning objective in (23) is to minimize the sum of squared norm of a difference between components of the sequence of excitation inputs and the sequence of command inputs.
Then, from the prediction model 701 and the cost function 702, the constraints 203, and the current state 206 a control problem is constructed 703 as
where U=[u0 . . . uN−1], and by solving it numerically, the command input to the train 211 is computed as u(k)=no.
Due to the particular construction developed in this paper, when the control-oriented cost function Jc is quadratic as in (22), the solution of (24) can be obtained by solving a procedure for constrained quadratic programming, because the constraints in equation (7) are linear constraints, (21) is linear, and the term added to Jc in equation (23) is quadratic.
Different embodiments of the invented dual control method can use different parameter estimators 220. One embodiment can be based on the recursive least squares (RLS) filters, or on constrained RLS filters.
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
7089093 | Lacote | Aug 2006 | B2 |
8332247 | Bailey | Dec 2012 | B1 |
8478463 | Knott | Jul 2013 | B2 |
8832000 | Jebara | Sep 2014 | B2 |
8838302 | Kumar | Sep 2014 | B2 |
9221476 | Breuer | Dec 2015 | B2 |
20070067678 | Hosek | Mar 2007 | A1 |
20080201027 | Julich | Aug 2008 | A1 |
20080288147 | Cesario | Nov 2008 | A1 |
20090204355 | Vold | Aug 2009 | A1 |
20090299996 | Yu | Dec 2009 | A1 |
20120271587 | Shibuya | Oct 2012 | A1 |
20130116937 | Calhoun | May 2013 | A1 |
20140180573 | Rhea | Jun 2014 | A1 |
20140358339 | Cooper | Dec 2014 | A1 |
20150008293 | Hatazaki | Jan 2015 | A1 |
20150375764 | Rajendran | Dec 2015 | A1 |
Entry |
---|
S. Di Cairano, A. Ulusoy, and S. Haghighat, “Soft-landing control by control invariance and receding horizon control,” in American Control Conference (ACC), 2014. IEEE, 2014. |
H. Genceli and M. Nikolaou, “New approach to constrained predictive control with simultaneous model identification,” AlChE Journal, vol. 42, No. 10, pp. 2857-2868, 1996. |
G. Marafioti, R. Bitmead, and M. Hovd, “Persistently exciting model predictive control using fir models,” in International Conference Cybernetics and Informatics, No. 2009, 2010, pp. 1-10. |
J. Rathousky and V. Havlena, “MPC-based approximate dual controller by information matrix maximization,” International Journal of Adaptive Control and Signal Processing, vol. 27, No. 11, pp. 974-999, 2013. |
T. A. N. Heirung, B. E. Ydstie, and B. Foss, “An MPC approach to dual control,” in 10th International Symposium on Dynamics and Control of Process Systems (DYCOPS), Mumbai, India, 2013. |
T. A. N. Heirung, B. E. Ydstie, and B. Foss, “An adaptive model predictive dual controller,” in Adaptation and Learning in Control and Signal Processing, vol. 11, No. 1, 2013, pp. 62-67. |
A. Weiss and S. Di Cairano, “Robust dual control MPC with guaranteed constraint satisfaction,” in Proceedings of IEEE Conference on Decision and Control, Los Angeles, CA, Dec. 2014. |
M. S. Lobo and S. Boyd, “Policies for simultaneous estimation and optimization,” in Proceedings of the American Control Conference, San Diego, CA, Jun. 1999. |
K. Mohan and M. Fazel, “Iterative reweighted algorithms for matrix rank minimization,” The Journal of Machine Learning Research, vol. 13, No. 1, pp. 3441-3473, 2012. |
Number | Date | Country | |
---|---|---|---|
20160244077 A1 | Aug 2016 | US |