The present application claims the benefit under 35 U.S.C. § 119 of German Patent Application No. DE 10 2022 212 638.6 filed on Nov. 25, 2022, which is expressly incorporated herein by reference in its entirety.
The present invention relates to devices and methods for controlling a robot.
Optimal control and more specifically dynamic programming in discrete time is a key concept for controlling multi-DoF (degree of freedom) robots to achieve complex tasks. An optimal control problem is defined via a cost function, which encodes the goal of the task, and constraints that ensure dynamics consistency and control limitations along the optimal state-control trajectory. However, most approaches for designing feasible trajectories for robots with nonlinear dynamics consider the system state to be Euclidean and adapting them to the Riemannian setting is non-trivial. An important challenge when the state (or configuration) space is a Riemannian manifold is the lack of a global vector space, thus recursively solving a dynamic program is not well defined.
Therefore, efficient approaches for controlling a robot device whose space of configurations is given by a Riemannian manifold (such as a sphere) are desirable.
According to various embodiments of the present invention, a method for controlling a technical system is provided comprising:
The above method provides an efficient way to determine control information (i.e. control inputs like control signals for actuators etc.) when the technical system comprises states which are represented by elements of a Riemannian manifold.
Various examples of the present invention are described in the following.
Example 1 is a method for controlling a technical system as described above.
Example 2 is the method of example 1, comprising separating the sequence of control times into a plurality of segments and treating the starting state of each segment as a decision variable when determining the updated control sequence.
In other words, Gauss-Newton Multiple Shooting is applied in a Riemannian iLQC. This increases stability when there are many control time steps.
Example 3 is the method of example 1 or 2, wherein the state approximation parameters linearly approximate the dependency of a change of state of the technical system from the control information in the tangent space of the Riemannian manifold at the state.
This allows considering the impact of the control information on the state with sufficient accuracy while keeping the computational burden sufficiently low to allow efficient control.
Example 4 is the method of any one of examples 1 to 3, wherein the cost approximation parameters quadratically approximate the dependency of a change of control cost of the state from the control information in the tangent space of the Riemannian manifold at the state.
This allows considering the impact of the control information on the control cost with sufficient accuracy while keeping the computational burden sufficiently low to allow efficient control.
Example 5 is the method of any one of examples 1 to 4, wherein the value approximation parameters quadratically approximate the dependency of a change of value of the state from the control information in the tangent space of the Riemannian manifold at the state.
This allows considering the impact of the control information on the value with sufficient accuracy while keeping the computational burden sufficiently low to allow efficient control.
Example 6 is the method of any one of examples 1 to 5, comprising performing multiple iterations comprising, in each iteration from a first to a last iteration,
So, multiple iterations of the update process of example 1 may be performed which increases the performance of the control.
Example 7 is a controller configured to perform a method of any one of the above examples.
Example 8 is a computer program comprising instructions which, when executed by a computer, makes the computer perform a method according to any one of the above examples.
Example 9 is a computer-readable medium comprising instructions which, when executed by a computer, makes the computer perform a method according to any one of the above examples.
In the figures, similar reference characters generally refer to the same parts throughout the different views. The figures are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the present invention. In the following description, various aspects are described with reference to the figures.
The following detailed description refers to the figures that show, by way of illustration, specific details and aspects of this disclosure in which the present invention may be practiced. Other aspects may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the present invention. The various aspects of this disclosure are not necessarily mutually exclusive, as some aspects of this disclosure can be combined with one or more other aspects of this disclosure to form new aspects.
In the following, various examples will be described in more detail.
The robot 100 includes a robot arm 101, for example an industrial robot arm for handling or assembling a work piece (or one or more other objects) . The robot arm 101 includes manipulators 102, 103, 104 and a base (or support) 105 by which the manipulators 102, 103, 104 are supported. The term “manipulator” refers to the movable members of the robot arm 101, the actuation of which enables physical interaction with the environment, e.g. to carry out a task. For control, the robot 100 includes a (robot) controller 106 configured to implement the interaction with the environment according to a control program. The last member 104 (furthest from the support 105) of the manipulators 102, 103, 104 is also referred to as the end-effector 104 and may include one or more tools such as a welding torch, gripping instrument, painting equipment, or the like.
The other manipulators 102, 103 (closer to the support 105) may form a positioning device such that, together with the end-effector 104, the robot arm 101 with the end-effector 104 at its end is provided. The robot arm 101 is a mechanical arm that can provide similar functions as a human arm (possibly with a tool at its end) .
The robot arm 101 may include joint elements 107, 108, 109 interconnecting the manipulators 102, 103, 104 with each other and with the support 105. A joint element 107, 108, 109 may have one or more joints, each of which may provide rotatable motion (i.e. rotational motion) and/or translatory motion (i.e. displacement) to associated manipulators relative to each other. The movement of the manipulators 102, 103, 104 may be initiated by means of actuators controlled by the controller 106.
The term “actuator” may be understood as a component adapted to affect a mechanism or process in response to be driven. The actuator can implement instructions issued by the controller 106 (the so-called activation) into mechanical movements. The actuator, e.g. an electromechanical converter, may be configured to convert electrical energy into mechanical energy in response to driving.
The term “controller” may be understood as any type of logic implementing entity, which may include, for example, a circuit and/or a processor capable of executing software stored in a storage medium, firmware, or a combination thereof, and which can issue instructions, e.g. to an actuator in the present example. The controller may be configured, for example, by program code (e.g., software) to control the operation of a system, a robot in the present example.
In the present example, the controller 106 includes one or more processors 110 and a memory 111 storing code and data based on which the processor 110 controls the robot arm 101. According to various embodiments, the controller 106 controls the robot arm 101 on the basis of a optimal control algorithm 112.
In the present example, the system (i.e. here robot) state r=[x,{dot over (x)}, . . . ] is the position on the Riemannian manifold x∈ and higher order derivates, such as velocities {dot over (x)} defined in the tangent space of the manifold at x, . In the present example, a second order system state r=[x,{dot over (x)}] is considered. These vectors are considered to be given in embedding space, or ambient coordinates. For example, in case of an S1 manifold, the ambient space is 2, that is, the second order state is a 4-vector.
In most cases it is easier to evaluate computations directly in the tangent space of the manifold at a given position. The tangent vectors are defined directly in the Euclidean submanifold expanded by the local basis of the tangent space . In order to work with tangent vectors in different tangent spaces a consistent, global definition of the local basis is needed. Therefore, according to various embodiments, the manifold is endowed with an origin, o∈ and a fixed basis, Bo=[bo0, . . . , boN], |Bo|=1 defined in , with N as the embedding, or ambient space dimension (N=2 for S1).
For the S1 manifold the origin may be defined in ambient coordinates at o=[0,1]T and the basis vector as Bo=[bo0,bo1]=[−1,0]. Then, any tangent vector v∈1 can be expressed in ambient space at x as va=Bxv∈. Similarly, the tangent vector can be recovered from the ambient tangent space vector as v=Bx−1va. The basis at x can be computed via the parallel transport operation
Bx=[bx0, . . . , bxN][Γo→x(bo0), . . . , Γo→x(boN)]. (1)
In the example of
To rollout a trajectory given the initial state r0 and the control sequence U0=[u0, . . . , uT−1], the equation rt+1=(rt, ut) needs to be repeatedly solved (for all t up to the final time T). To keep the problem tractable tangent space computations in at time step t are used. To this end, the tangent space position vt∈ is introduced and it is assumed that {dot over (x)}t∈. The tangent space dynamics equation is given by
[vt+1,{dot over (x)}l+1]=f(vt, {dot over (x)}t, ut)∈ (2)
It should be noted that the tangent space dynamics model f(·) is defined in the vector space of , as opposed to the manifold dynamics model f(·), which directly maps to manifold state. In practice, it is easier to define and rollout the dynamics in tangent space and impose manifold operations as an additional step, therefore, according to various embodiments, the tangent space dynamics are used.
After solving Eq. (2) the subsequent state rt=[xt+1, {dot over (x)}t+1] is computed by
x
l+1
=Exp
x
(vl+1),
{dot over (x)}
t+1=Γx
with Γx
The parallel transport is part of the dynamic rollout (in particular here the transition from time t to time t+1 in the rollout) .
The problem of controlling the robot 100, which is in this example a Riemannian optimal control problem, is defined as finding the control sequence U0=[u0, . . . , uT−1] with initial state r0=[x0, {dot over (x)}0] that minimizes the cost function
with lt as the cost function at time t and with lT as the final cost (i.e. the cost at the final time T). The value of state xt at time t is defined as the optimal cost to go
wherein Jt is the measure of the total cost from time step t (similar to equation (4) where t=0).
The cost may for example be energy cost and reflect how well a goal is achieved (e.g. how well a target location is reached by the robot).
Due to the Principle of Optimality the optimal control problem can be reformulated using the Bellmann equation
Solving the Bellmann equation in u yields the optimal control sequence
U=[u0, . . . , uT−1]. However, for most real-world problems with nonlinear dynamics models and cost functions the optimal control problem does not have a closed form solution.
According to various embodiments a Differential Dynamic Programming approach is used. This includes approximating the nonlinear dynamics and cost with a local linear-quadratic function, in which case the value becomes locally quadratic. Then, the local linear-quadratic sub-problems are solved iteratively until convergence to an optimal solution. This approach is tractable given the twice-differentiable cost and dynamics.
Further, this iterative approach is compatible with the Riemannian setting. In essence, the recursive computations of dynamic programs are evaluated in the tangent plane of the current state and parallel transportation is employed when necessary. A local linear dynamics equation is used such that [v,{dot over (x)}]t+1=Ax
It is assumed that the controlled technical system (e.g. the robot 100) has an initial state xt
By rolling out the trajectory with dynamics f(·) as described above a nominal state sequence
where for simplicity the notation of vt is slightly abused to now corresponds to the full state in tangent space, including position and velocity. The dynamics equation with the differential states and after dynamics linearization in becomes
and consequently
A
A linear-quadratic approximation of the cost function at every time step in the tangent space is used as follows
with
The parameters of (11) may be seen as cost approximation parameters for the state
It should be noted that here the cost function lt is (again with a slight abuse of notation) applied to the delta values to mean the change of the cost function when vt and ut change according to the delta values. A similar notation is used in the following for the value function.
Here it is assumed that the loss function approximation lt(·) can be computed with tangent space vectors. It should be noted that the constant component qt equals the cost along the nominal trajectory. A linear-quadratic form for the value function
V
t(δvt)=ŝt+δvtTst+δvtTStδvt.
is assumed with variables ŝt,st,St. These include in particular value approximation parameters for approximating the dependency of a change Vt(δvt) of value of the state from the control information ut in the tangent space at
Using the Bellmann equation for finite-horizon problems the value function can be expanded, such that
where
are the transported value function approximation components, as st+1, St+1∈t+1. Using the linear model approximation δvt+1=A
As the gradient of the optimal value function with respect to the control difference δut has to be 0, a closed form solution for the control update can be obtained as
which shows that the optimal control update has a feedforward δutff=−Ht−1gt and a feedback components with gain Kt=Ht−1Gt. Substituting the optimal control update δut into equation (9) and by the definition of the quadratic value function approximation Vt(δvt)=ŝt+δvtTst+δvtTStδvt the recursive equations
can be obtained, which are initialized with
ST=QT,sT=qT,ŝT={circumflex over (q)}T.
According to the above, according to various embodiments, Riemannian iterative linear quadratic control (iLQC) is performed as follows:
In the following, an example algorithm (e.g. corresponding to the optimal control algorithm 112) for the Riemannian iLQC is given.
indicates data missing or illegible when filed
It should be noted that in the Euclidean case with a quadratic cost and a linear dynamics, the algorithm given above converges in one iteration. However, as manifold operations are not generally linear, in the Riemannian case even with quadratic costs and linear tangent space dynamics it may take multiple iterations till convergence.
Further, it should be noted that the computation of the backward-forward pass (i.e. backward pass followed by the forward pass) takes place along the nominal trajectory, that is, along {dot over (
As the iLQC relies on local linear-quadratic approximation of the optimal control problem, an initial nominal trajectory needs to be provided. This is achieved by forward integrating, or single shooting the initial stabilizing control sequence. Often this control sequence is difficult to obtain, especially in the case of nonlinear dynamics. In practice a proportional-derivate controller (PD) can be applied, which is straightforward to implement for Riemannian manifolds. However, the trajectory generated by a PD controller might be far from optimal with respect to the cost lt(·).
A disadvantage of single shooting is the accumulation of model approximation errors along the updated trajectory, which might even lead to unstable trajectories. In the following, a Riemannian extension to Gauss-Newton Multiple Shooting is described which addresses this issue and which leads to more stable behaviours.
As mentioned, for nonlinear dynamics systems finding an initial stabilizing control sequence is often nontrivial. With a poor initialization iLQC may not converge to a stable solution. Notice that with iLQC only the control sequence U is optimized and the dynamics are forward integrated to obtain the state trajectory X from xt
Therefore, according to various embodiments, not only the controls are optimized but also the states as additional decision variables. This approach breaks the whole state-control trajectory into smaller pieces (even as small as one time step length) and provides a solution for the short trajectory segments. Instead of single shooting with the full control sequence from the initial state, as with iLQC, this is then multiple shooting of smaller trajectory segments (also denoted as Gauss-Newton Multiple Shooting).
While the number of optimization variables increases, solving the local control problems is significantly more stable, as state changes are not integrated over the whole trajectory, but in the local segments only. The stitching of trajectory segments, that is, the satisfaction of dynamics constraints, is iteratively achieved in a forward-backward computation pass.
For the Gauss-Newton multiple shooting, the difference between the decision variable and the dynamics rollout is defined as the defect dt as
d
t
=f(xt,{dot over (x)}t,ut)−[Logx
with dt∈. In equation (16) every state xt
Using system dynamics linearization along the state sequence the local dynamics model with the defect and differential states δv, δu can be written as
The iLQC approach is applied to arrive at exactly the same local quadratic approximation of the cost function as above (equation (10)) . The local quadratic definition of the value function will however, as opposed to iLQC, include the defect according to
V
t(δvt)=lt(δvt,δut)+Vt+1(Ax
The optimal solution for the differential control δut only differs from the iLQC solution in the computation of the feedforward term
while the of the computations for Ht and Gt are the same as in equations (14). The recursive equations for Gauss-Newton multiple shooting are given by
with initializations ST=QT,sT=qT,sT={circumflex over (q)}T. After computing recursively the feedforward controls and the feedback gains, the decision variables need to be updated. Similarly as in the single shooting iLQC described above the nominal control is for Gauss-Newton multiple shooting updated as
However, the nominal state trajectory is not obtained by forward integration, but by the following update equation
It should be noted that the state updates (first two of the equations (22)) are computed in the tangent space of xt[k], where the defect dt is also defined. Then, after the state update in (third of the equations (22)), the velocity is also transported from xt[k] to the updated state xt+1[k+1] (fourth of the equations (22)).
In summary, according to various embodiments, a method is provided as illustrated in
In 501, an initial control sequence comprising control information for each control time of a sequence of control times is determined.
In 502, an initial state sequence which the technical system follows when being controlled according to the initial control sequence is determined, wherein the initial state sequence comprises states wherein each state is given by an element of a predetermined Riemannian manifold.
In 503, for each state of the initial state sequence,
In 504, for each control time, an updated control sequence is determined comprising updated control information for each control time by following the sequence of control times forwards and determining the updated values by determining them to maximize such that the values of the states of the state sequence which the technical system follows when being controlled according to the updated control sequence are maximized according to the value approximation parameters.
This means that the updated values are determined to maximize the values of the states which the technical system follows when being controlled according to the updated control sequence assuming that the value approximation parameters (correctly) reflect the dependency of the values from the states. This may be done, as explained above, by setting the derivative of the values with respect to the control information to zero to get a condition for determining the updated control information such that the above condition (maximizing the values) is fulfilled.
In 505, the technical system is controlled according to the updated control sequence.
The approach of
It may in particular be applied to robotic manipulators (SE (3) manifold of robotic end-effectors), or mobile robots, humanoids, quadrupeds in joint space (torus manifold, product of hypersphere manifolds). The Riemmanian manifold may for example be (or include) the S3 manifold which is the space where quaternions live, which are often used in robotics to represent orientation in 3D space. There is a direct conversion between S3and SO (3), but SO(3) is a bit more difficult to work with. Another option is the space of positive semi-definite matrices, e.g. convariances (or any square matrix with non-negative, real eigenvalues). This SPD manifold (which looks like a cone in embedding space) may be used to define force/velocity manipulatiblity. It should be noted that the state space may in various embodiments also be be a combination of Riemannian manifolds (e.g. a combination of a Euclidean space and orientation in a non-Euclidean manifold). Those may in that case together be considered as one Riemannian manifold whose elements give (i.e. represent) the states.
Various embodiments may receive and use sensor data from various visual sensors (cameras) such as video, radar, LiDAR, ultrasonic, thermal imaging, motion, sonar etc., for example to determine the technical system's state that is reached with a certain control. For example measure and control may be performed, i.e. data (e.g. scalar time series, in particular sensor data) may be analyzed and then the technical system may be operated accordingly.
According to one embodiment, the method is computer-implemented.
Although specific embodiments of the present invention have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the present invention. This application is intended to cover any adaptations or variations of the specific embodiments discussed herein.
Number | Date | Country | Kind |
---|---|---|---|
10 2022 212 638.6 | Nov 2022 | DE | national |