The invention relates to a system and method for controlling actuators of an articulated robot.
The traditional way of programming complex robots is sometimes sought to become more intuitive, such that not only experts but also shopfloor workers, in other words laymen, are able to utilize robots for their work. The terms “skill-” and “task-based programming” are very important in this context. “Skills” are in particular some formal representation of predefined actions or movements of the robot. Several approaches to programming with skills exist, e.g., publications [1], [2], and [3], and they are, in particular, mostly viewed independently from the controller, i.e., in particular, the controller only executes commands calculated by the skills implementation. From this it can be seen that the underlying controller is a common factor for manipulation skills and thus provides a set of parameters that is shared by them. It is, however, according to general knowledge not efficient and often not even feasible to use the same parameter values for all manipulation skills. Typically, it is even not possible considering the same skill in different environments.
Depending on the particular situation, the parameters have to be adapted in order to account for different environment properties such as rougher surfaces or different masses of involved objects. Within given boundaries of certainty the parameters could be chosen such that the skill is fulfilled optimally, or at least close to optimal with respect to a specific cost function. In particular, this cost function and constraints are usually defined by the human user with some intention, e.g., low contact forces, short execution time or a low power consumption of the robot. A significant problem in this context is the tuning of the controller parameters in order to find regions in the parameter space that minimize such a cost function or are feasible in the first place without necessarily having any pre-knowledge about the task other than the task specification and the robots abilities.
Several approaches were proposed that cope with this problem in different ways such as publication [4], in which learning motor skills by demonstration is described. In publication [5], a Reinforcement Learning based approach to acquiring new motor skills from demonstration is introduced. The authors of publications [6], [7] employ Reinforcement Learning methods to learn motor primitives that represent a skill. In publication [8] a supervised learning by demonstration approach is used with dynamic movement primitives to learn bipedal walking in simulation. An early approach utilizing a stochastic real-valued reinforcement learning algorithm in combination with a nonlinear multilayer artificial neural network in order to learn robotic skills can be found in publication [9]. Soft robotics is shown in publication [10], while impedance control to apply the idea to complex manipulation problems is shown in publication [11]. An adaptive impedance controller is introduced in publication [12]. Both are adapted during execution, depending on the motion error and based on four physically meaningful meta parameters. From this the question arises how these meta parameters are to be chosen with respect to the environment and the problem at hand.
It is the objective of the invention to provide a system and a method for improved learning of robot manipulation skills.
A first aspect of the invention relates to a system for controlling actuators of an articulated robot and for enabling the robot to execute a given task, including:
(S, O, Cpre, Cerr, Csuc, Rm, χcmd, X, P, Q) with
S: a Cartesian product of I subspaces ζi: S=ζi=1×ζi=2× . . . ×ζi=I
with i={1, 2, . . . , I} and I≥2,
O: a set of physical objects,
Cpre: a precondition,
Cerr: an error condition,
Csuc: a success condition,
R: a nominal result of ideal skill execution,
χcmd: skill commands,
X: physical coordinates,
P: skill parameters, with P consisting of three subsets Pt, Pl, PD, with Pt being the parameters resulting from a priori knowledge of the task, Pl being the parameters not known initially which need to be learned and/or estimated during execution of the task, and PD being constraints of parameters Pl,
Q: a performance metric, while Q(t) is denoting the actual performance of the skill carried out by the robot, and
Preferably, the subspaces ζi include a control variable, in particular, a desired variable, or an external influence on the robot or a measured state, in particular, an external wrench including, in particular, an external force and an external moment.
A preferred adaptive controller is derived as follows:
Consider the robot dynamics
M(q){umlaut over (q)}+C(q,{dot over (q)}){dot over (q)}+g(q)=τu+τext (1)
where M(q) denotes the symmetric, positive definite mass matrix, C(q, {dot over (q)}){dot over (q)} the Coriolis and centrifugal torques and g(q) the gravity vector. The control law is defined as:
τu(t)=J(q)T(−Fff−K(t)e−Dė)+τr (2)
where Fff(t) denotes the feed-forward wrench, K(t) the stiffness matrix, D the damping matrix and J(q) the Jacobian. The position and velocity error are denoted by e=[et, er]T and ė={dot over (x)}*−{dot over (x)}, respectively. et=x*−x is the translational position error and er=θ*−θ the rotational angle axis error. The dynamics compensator τr is defined as:
τr=M(q){umlaut over (q)}*+C(q,{dot over (q)}){dot over (q)}*+g(q)−sign(ϵ)
The feed forward wrench Fff is defined as:
F
ff
=F
d(t)+∫t
where Fd(t) is an optional initial time dependent trajectory and Fff,0 is the initial value of the integrator. The controller adapts feed forward wrench and stiffness via
The adaptive tracking error is defined as
ϵ=e+κ{dot over (ϵ)} (9)
with κ>0. The positive definite matrices α, β, γa and γβ represent the learning rates for the feed forward and stiffness and the forgetting factors, respectively. Damping D is designed according to publication [21] and T is the sample time of the controller.
With the explanations above, the preferred adaptive controller is basically given.
A preferred γa and γβ are derived via constraints as follows:
The first constraint of an adaptive impedance controller is the upper bound {dot over (K)}max on the speed of stiffness adaptation. Inserting cα:=αγα and cβ:=βγβ into (8) leads, together with the bounded stiffness rate of change, to the relation:
If it is assumed that K (t=0) and ė=0, emax is preferably defined as the amount of error at which
holds. Furthermore, Kmax denotes the absolute maximum stiffness, another constraint for any real-world impedance controlled robot. Then, the maximum value for β can be written as:
Since δK (t)=0 and ė=0 when Kmax is reached, (10) can be rewritten as
Finally, the adaptation parameters become
Finding the adaptation of feed forward wrench is preferably done analogously. This way, the upper limits for α and β are, in particular, related to the inherent system capabilities Kmax and Fmax leading to the fastest possible adaptation.
With the above explanations, preferred γa and γβ are derived.
The introduced skill formalism focuses, in particular, on the interplay between abstract skill, meta learning (by the learning unit) and adaptive control. The skill provides, in particular, desired commands and trajectories to the adaptive controller together with meta parameters and other relevant quantities for executing the task. In addition, a skill contains, in particular, a quality metric and parameter domain to the learning unit, while receiving, in particular, the learned set of parameters used in execution. The adaptive controller commands, in particular, the robot hardware via desired joint torques and receives sensory feedback. Finally, the skill formalism, in particular, makes it possible to easily connect to a high-level task planning module. The specification of robot skills s are preferably provided as follows from the first unit:
The following preferred skill formalism is object-centered, in the sense that the notion of the manipulated objects are of main concern. The advantage of this approach is its simple notation and intuitive interpretability. The aspect of greater intuitiveness is based on the similarity to natural language:
Definition 1 (Skill): A skill s is an element of the skill-space. It is defined as a tuple (S, O, Cpre, Cerr, R, χcmd, X, P, Q).
Definition 2 (Space): Let S be the Cartesian product of I subspaces ζi=m
Preferably, the subspaces ζi include a control variable, in particular, a desired variable, or an external influence on the robot or a measured state, in particular, an external wrench including, in particular, an external force and an external moment.
Definition 3 (Object): Let o represent a physical object with coordinates oX(t)∈S associated with it. O denotes the set of physical objects o∈O relevant to a skill s with no=|O| and no>0. Moreover, X(t) is defined as X(t)=(o
Definition 4 (Task Frame): The task frame oRTF(t) denotes the rotation from frame TF to the base frame 0. Note that we assume oRTF(t)=const.
Definition 5 (Parameters): P denotes the set of all skill parameters consisting of three subsets Pt, Pl and PD. The set Pt⊂P contains all parameters resulting from a priori task knowledge, experience and the intention under which the skill is executed. In this context, it is referred to Pt also as task specification. The set Pl⊂P contains all other parameters that are not necessarily known beforehand and need to be learned or estimated. In particular, it contains the meta parameters (α, β, γα, γβ) for the adaptive controller. The third subset PD⊂P defines the valid domain for Pl, i.e., it consists of intervals of values for continuous parameters or sets of values for discrete ones. Thus, PD determines the boundaries when learning Pl.
Conditions: There are preferably three condition types involved in the execution of a skill: preconditions, failure conditions and success conditions. They all share the same basic definition, yet their application is substantially different. Their purpose is to define the borders and limits of the skill from start to end:
Definition 6 (Condition): Let C⊂S be a closed set and c(X(t)) a function c: S→B where B={0, 1}. A condition holds iff c(X(t))=1. Note that the mapping itself depends on the specific type of condition.
Definition 7 (Precondition): Cpre denotes the chosen set for which the precondition defined by cpre(X(t)) holds. The condition holds, i.e., cpre(X(t0))=1, iff ∀x∈X: x(t0)∈Cpre. to denotes the time at start of the skill execution. This means that at the beginning of skill execution the coordinates of every involved object must lie in Cpre.
Definition 8 (Error Condition): Cerr denotes the chosen set for which the error condition cerr(X(t)) holds, i.e., cerr(X(t))=1. This follows from ∃x∈X: x(t)∈Cerr. If the error condition is fulfilled at time t, skill execution is interrupted. Herein, assumptions about how this error state is resolved are not made herein since this depends, in particular, on the actual skill implementation and the capabilities of high-level control and planning agency.
Definition 9 (Success Condition): Csuc denotes the chosen set for which the success condition defined by csuc(X(t)) holds, i.e., csuc(X(t))=1 iff ∀x∈X: x(t)∈Csuc. If the coordinates of all involved objects are within Csuc the skill execution can terminate successfully. With this it is not stated that the skill has to terminate.
Definition 10 (Nominal Result): The nominal result R∈S is the ideal endpoint of skill execution, i.e., the convergence point. Although the nominal result R is the ideal goal of the skill, its execution is nonetheless considered successful if the success conditions Csuc hold. Nonetheless X(t) converges to this point. However, it is possible to blend from one skill to the next if two or more are queued.
Definition 11 (Skill Dynamics): Let X: [t0,∞]→P be a general dynamic process, where t0 denotes the start of the skill execution. The process can terminate if (∀ csuc∈Csuc: csuc(X(t))=1)∨(∃ cerr∈Cerr: cerr(X(t))=1).
It converges to the nominal result R. This dynamic process encodes what the skill actually does depending on the input, i.e., the concrete implementation. This is preferably one of: a trajectory generator, a DMP, or some other algorithm calculating sensor based velocity or force commands. The finish time te is not necessarily known a priori. For example, for a search skill it cannot be determined when it terminates because of the very nature of the search problem.
Definition 12 (Commands): Let χcmd⊂X(t) be the skill commands, i.e., a desired trajectory consisting of velocities and forces defined in TF sent to the controller.
Definition 13 (Quality Metric): Q denotes the set of all 2-tuples (w,fq(X(t)) with 0<w<1 and constraints fc,i(X(t)). Furthermore let q=Σiwifq,i(X(t))∀(wi, fq,i(X(t)))∈Q.
The quality metric is a means of evaluating the performance of the skill and to impose quality constraints on it. This evaluation aims at comparing two different implementations of the same skill or two different sets of parameters P. The constraints can, e.g., be used to provide a measure of quality limits for a specific task (e.g., a specific time limit). Note, that the quality metric reflects some criterion that is derived from the overall process in which the skill is executed or given by a human supervisor. Moreover, it is a preferred embodiment that a skill has several different metrics to address different demands of optimality.
With the above, the specification of robot skills s is provided in a preferable way from the first unit.
The learning unit is preferably derived as follows:
The learning unit applies meta learning, which in particular means finding the right (optimal) parameters p*∈Pl for solving a given task. Requirements: In order to learn the controller meta parameters together with other parameters such as execution velocity, several potentially suitable learning methods are to be evaluated. The method will face the following issues:
Therefore, a suitable learning algorithm will have to fulfill the subsequent requirements:
Preferably, one of the following algorithms or a combination thereof for meta learning is applied in the learning unit: Grid Search, Pure Random Search, Gradient-descent family, Evolutionary Algorithms, Particle Swarm, Bayesian Optimization.
Generally, gradient-descent based algorithms require a gradient to be available. Grid search and pure random search, as well as evolutionary algorithms, typically do not assume stochasticity and cannot handle unknown constraints without extensive knowledge about the problem they optimize, i.e., make use of well-informed barrier functions. The latter point also applies to particle swarm algorithms. Only Bayesian optimization in accordance to publication [25] is capable of explicitly handling unknown noisy constraints during optimization. Another and certainly one of the major requirements is little, if possible, no manual tuning to be necessary. Choosing, for example, learning rates or making explicit assumptions about noise would break with this intention. Obviously, this requirement depends to a great deal on the concrete implementation, but also on the optimizer class and its respective requirements.
Considering all mentioned requirements, the spearmint algorithm known from publications [26], [27], [28], and [25] is preferably applied. This particular implementation requires no manual tuning, it is only required to specify the prior and the acquisition function in advance once.
More preferably, a Bayesian Optimization is applied. Preferably, it is realized and implemented as follows:
In general, Bayesian optimization (BO) finds the minimum of an unknown objective function ƒ(p) on some bounded set X by developing a statistical model of ƒ(p). Apart from the cost function, it has two major components, which are the prior and the acquisition function. Prior: In particular, a Gaussian process is used as prior to derive assumptions about the function being optimized. The Gaussian process has a mean function m: χ→ and a covariance function K: χ×χ→. As a kernel, preferably, the automatic relevance determination (ARD) Matérn 5/2 kernel is used, which is given by:
This kernel has d+3 hyperparameters in d dimensions, i.e., one characteristic length scale per dimension, the covariance amplitude θ0, the observation noise v and a constant mean m. These kernel hyperparameters are integrated out by applying Markov chain Monte Carlo (MCMC) via slice sampling in publication [29]. Acquisition function: Preferably a predictive entropy search with constraints (PESC) is used as a means to select the next parameters x to explore, as described in publication [30]. Cost function: Preferably a cost metric Q defined as above is used directly to evaluate a specific set of parameters Pl. Also, the success or failure of the skill by using the conditions Csuc and Cerr can be evaluated. Bayesian optimization can make direct use of the success and failure conditions as well as the constraints in Q as described in publication [25].
The invention presents the following advantages: The adaptive controller from publication [12] is extended to Cartesian space and full feed forward tracking. A novel meta parameter design for the adaptive controller based on real-world constraints of impedance control is provided. A novel formalism to describe robot manipulation skills and bridge the gap between high-level specification and low-level adaptive interaction control is introduced. Meta learning via Bayesian Optimization in publication [14], which is frequently applied in robotics in publications [16], [17], [18], is the missing computational link between adaptive impedance control and high-level skill specification. A unified framework that composes all adaptive impedance control, meta learning and skill specification into a closed loop system is introduced.
According to an embodiment of the invention the adaptive controller adapts feed forward wrench and stiffness via δFff=Fff(t)−Fff(t−T).
According to another embodiment of the invention the learning unit carries out a Bayesian and/or a HiREPS optimization/learning.
HiREPS is the acronym of “Hierarchical Relative Entropy Policy Search”.
According to another embodiment of the invention, the system includes a data interface with a data network, and the system is designed and set up to download system-programs for setting up and controlling the system from the data network.
According to another embodiment of the invention, the system is designed and set up to download parameters for the system-programs from the data network.
According to another embodiment of the invention, the system is designed and set up to enter parameters for the system-programs via a local input-interface and/or via a teach-in-process, with the robot being manually guided.
According to another embodiment of the invention, the system is designed and set up such that downloading system-programs and/or respective parameters from the data network is controlled by a remote station, and wherein the remote station is part of the data network.
According to another embodiment of the invention, the system is designed and set up such that system-programs and/or respective parameters locally available at the system are sent to one or more participants of the data network based on a respective request received from the data network.
According to another embodiment of the invention, the system is designed and set up such that system-programs with respective parameters available locally at the system can be started from a remote station, and wherein the remote station is part of the data network.
According to another embodiment of the invention, the system is designed and set up such that the remote station and/or the local input-interface includes a human-machine-interface HMI designed and setup for entry of system-programs and respective parameters and/or for selecting system-programs and respective parameters from a multitude system-programs and respective parameters.
According to another embodiment of the invention, the human-machine-interface HMI is designed and set up such that entries are possible via “drag-and-drop”-entry on a touchscreen, a guided dialogue, a keyboard, a computer-mouse, a haptic interface, a virtual-reality-interface, an augmented reality interface, an acoustic interface, via a body tracking interface, based on electromyographic data, based on elektroenzephalographic data, via a neuronal interface, or a combination thereof.
According to another embodiment of the invention, the human-machine-interface HMI is designed and set up to deliver auditive, visual, haptic, olfactoric, tactile, or electrical feedback or a combination thereof.
Another aspect of the invention relates to robot with a system as shown above and in the following.
Another aspect of the invention relates to a method for controlling actuators of an articulated robot and enabling the robot executing a given task, the robot including a first unit, a second unit, a learning unit, and an adaptive controller, the second unit being connected to the first unit, and further to a learning unit and to an adaptive controller, including the following steps:
(S, O, Cpre, Cerr, Csuc, R, χcmd, X, P, Q) with
S: a Cartesian product of I subspaces ζi: S=ζi=1×ζi=2× . . . ×ζi=I
with i={1, 2, . . . , I} and I≥2,
O: a set of objects,
Cpre: a precondition,
Cerr: an error condition,
Csuc: a success condition,
R: a nominal result of ideal skill execution,
χcmd: skill commands,
X: physical coordinates,
P: skill parameters, with P consisting of three subsets Pt, Pl, PD, with Pt being the parameters resulting from a priori knowledge of the task, Pl being the parameters not known initially and need to be learned and/or estimated during execution of the task, and PD being constraints of parameters Pl,
Q: a performance metric, while Q(t) is denoting the actual performance of the skill carried out by the robot,
wherein the second unit is connected to the first unit and further to a learning unit and to the adaptive controller and wherein the skill commands χcmd include the skill parameters Pl,
Preferably, the subspaces ζi include a control variable, in particular, a desired variable, or a external influence on the robot or a measured state, in particular, an external wrench including, in particular, an external force and an external moment.
Another aspect of the invention relates to a computer system with a data processing unit, wherein the data processing unit is designed and set up to carry out a method according to one of the preceding alternatives.
Another aspect of the invention relates to a digital data storage with electronically readable control signals, wherein the control signals can coaction with a programmable computer system, so that a method according to one of the preceding alternatives is carried out.
Another aspect of the invention relates to computer program product including a program code stored in a machine-readable medium for executing a method according to one of the preceding alternatives, if the program code is executed on a computer system.
Another aspect of the invention relates to computer program with program codes for executing a method according to one of the preceding alternatives, if the computer program runs on a computer system.
The sources of prior art mentioned above and additional sources are the following publications:
In the drawings:
In
S={x, R, Fext, τext}, where x∈3 is the position in Cartesian space, R∈3×3 is the orientation, Fext=[fext, mext]T∈6 is the wrench of the external forces and torques and τext∈n is the vector of external torques where n denotes the number of joints. Objects O=[r, p, h], where r is the robot 80, p the object or peg 3 grasped with the robot 80 and h the hole 5. Cpre={X∈S|fext,z>fcontact, x∈U(x), g(r, p)=1} states that the robot 80 shall sense a specified contact force fcontact and the peg 3 has to be within the region of interest ROI 1, which is defined by U(⋅). The function g(r,p) simplifies the condition of the robot r 80 having grasped the peg p 3 to a binary mapping. Csuc={X∈S|xz>xz,0+d} which states that the peg 3 has to be partially inserted by at least d into the hole 5 for the skill to terminate successfully. Ideally d is the depth of the hole 5.
Cerr={X∈S|x∉U(x), τext>τmax} states that the skill fails if the robot 80 leaves the ROI 1 or the external torques exceed some specified safety limit component wise. P={Pt, Pl} with Pt={a, d, {tilde over (T)}, r} and Pl={αt, αr, βt, βr, Fff,0, vt, vr}. a is the amplitude of the Lissajous curves, d is the desired depth, {tilde over (T)} is the pose estimation of the hole 5 and r is the radius of the region of interest ROI 1. The controller parameters α, β and Fff,0 are applied as in the above shown general description. v is a velocity and the indices t, r refer to translational and rotational directions, respectively. Qtime={te−ts, fz,max=maxt fext,z}, where te and ts are the start and end time of the skill execution and fext,z is the external force in z-direction. This metric aims to minimize execution time and comply to a maximum level of contact forces in the direction of insertion simultaneously.
Definition 1 (Skill): A skill s is an element of the skill-space. It is defined as a tuple (S, O, Cpre, Cerr, Csuc, R, χcmd, X, P, Q).
Definition 2 (Space): Let S be the Cartesian product of I subspaces ζi=m
Definition 3 (Object): Let o represent a physical object with coordinates oX(t)∈S associated with it. O denotes the set of all objects o∈O relevant to a skill s with no=|O| and no>0. Moreover, X(t) is defined as X(t)=(o
Definition 4 (Task Frame): The task frame oRTF(t) denotes the rotation from frame TF to the base frame O. It is assumed that oRTF(t)=const.
Definition 5 (Parameters): P denotes the set of all skill parameters consisting of three subsets Pt, Pl and PD. The set Pt⊂P contains all parameters resulting from a priori task knowledge, experience and the intention under which the skill is executed. It is referred to Pt also as task specification. The set Pl⊂P contains all other parameters that are not necessarily known beforehand and need to be learned or estimated. In particular, it contains the meta parameters (α, β, γα, γβ) for the adaptive controller 104. The third subset PD⊂P defines the valid domain for Pl, i.e., it consists of intervals of values for continuous parameters or sets of values for discrete ones. Thus, PD determines the boundaries when learning Pl.
Conditions: There are three condition types involved in the execution of a skill: preconditions, failure conditions and success conditions. They all share the same basic definition, yet their application is substantially different. Their purpose is to define the borders and limits of the skill from start to end:
Definition 6 (Condition): Let C⊂S be a closed set and c(X(t)) a function c: S→B where B={0, 1}. A condition holds iff c(X(t))=1. The mapping itself depends on the specific type of condition.
Definition 7 (Precondition): Cpre denotes the chosen set for which the precondition defined by cpre(X(t)) holds. The condition holds, i.e., cpre(X(t0))=1, iff ∀ x∈X: x(t0)⊂Cpre. to denotes the time at start of the skill execution. This means that at the beginning of skill execution the coordinates of every involved object must lie in Cpre.
Definition 8 (Error Condition): Cerr denotes the chosen set for which the error condition cerr(X(t)) holds, i.e., cerr(X(t))=1. This follows from ∃ x∈X: x(t)∈Cerr. If the error condition is fulfilled at time t, skill execution is interrupted. No assumptions are made about how this error state is resolved since this depends on the actual skill implementation and the capabilities of high-level control and planning agency.
Definition 9 (Success Condition): Csuc denotes the chosen set for which the success condition defined by csuc(X(t)) holds, i.e., csuc(X(t))=1 iff ∀ x∈X: x(t)∈Csuc. If the coordinates of all involved objects are within Csuc the skill execution can terminate successfully.
Definition 10 (Nominal Result): The nominal result R∈S is the ideal endpoint of skill execution, i.e., the convergence point.
Although the nominal result R is the ideal goal of the skill, its execution is nonetheless considered successful if the success conditions Csuc. holds. Nonetheless X(t) converges to this point.
Definition 11 (Skill Dynamics): Let X:[t0,∞]→P be a general dynamic process, where t0 denotes the start of the skill execution. The process terminates if (∀ csuc∈Csuc: csuc(X(t))=1)∨(∃cerr∈Cerr: cerr(X(t))=1).
It converges to the nominal result R. This dynamic process encodes what the skill actually does depending on the input, i.e., the concrete implementation. This is a trajectory generator, a DMP, or some other algorithm calculating sensor based velocity or force commands. The finish time te is not necessarily known a priori. For a search skill it cannot be determined when it terminates because of the very nature of the search problem.
Definition 12 (Commands): Let χcmd⊂X(t) be the skill commands, i.e., a desired trajectory consisting of velocities and forces defined in TF sent to the controller.
Definition 13 (Quality Metric): Q denotes the set of all 2-tuples (w,fq(X(t)) with 0<w<1 and constraints fc,i(X(t)). Furthermore let q=Σiwifq,i(X(t))∀(wi,fq,i(X(t)))∈Q.
The quality metric is a means of evaluating the performance of the skill and to impose quality constraints on it. This evaluation aims at comparing two different implementations of the same skill or two different sets of parameters P. The constraints are used to provide a measure of quality limits for a specific task (e.g., a specific time limit). The quality metric reflects some criterion that is derived from the overall process in which the skill is executed or given by a human supervisor.
S: a Cartesian product of I subspaces ζi: S=ζi=1×ζi=2× . . . ×ζi=I
with i={1, 2, . . . , I} and I≥2,
O: a set of all objects,
Cpre: a precondition,
Cerr: an error condition,
Csuc: a success condition,
R: a nominal result of ideal skill execution,
χcmd: skill commands,
X: physical coordinates,
P: skill parameters, with P consisting of three subsets Pt, Pl, PD, with Pt being the parameters resulting from a priori knowledge of the task, Pl being the parameters not known initially and need to be learned and/or estimated during execution of the task, and PD being constraints of parameters Pl,
Q: a performance metric, while Q(t) is denoting the actual performance of the skill carried out by the robot 80,
Each of
S: a Cartesian product of I subspaces ζi: S=ζi=1×ζi=2× . . . ×ζi=I
with i{1, 2, . . . , I} and I≥2,
O: a set of all physical objects,
Cpre: a precondition,
Cerr: an error condition,
Csuc: a success condition,
R: nominal result of ideal skill execution,
χcmd: skill commands,
X: physical coordinates,
P: skill parameters, with P consisting of three subsets Pt, Pl, PD, with Pt being the parameters resulting from a priori knowledge of the task, Pl being the parameters not known initially and need to be learned and/or estimated during execution of the task, and PD being constraints of parameters Pl,
Q: a performance metric, while Q(t) is denoting the actual performance of the skill carried out by the robot 80,
S: a Cartesian product of I subspaces ζi: S=ζi=1×ζi=2× . . . ×ζi=I
with i={1, 2, . . . , I} and I≥2,
O: a set of all physical objects,
Cpre: a precondition,
Cerr: an error condition,
Csuc: a success condition,
R: a nominal result of ideal skill execution,
χcmd: skill commands,
X: physical coordinates,
P: skill parameters, with P consisting of three subsets Pt, Pl, PD, with Pt being the parameters resulting from a priori knowledge of the task, Pl being the parameters not known initially and need to be learned and/or estimated during execution of the task, and PD being constraints of parameters Pl,
Q: a performance metric, wherein Q(t) is denoting the actual performance of the skill carried out by the robot 80,
wherein the adaptive controller 104 receives skill commands χcmd,
wherein the skill commands χcmd include the skill parameters Pl,
wherein based on the skill commands χcmd the controller 104 controls the actuators of the robot 80 via a control signal τd, wherein the actual status X(t) of the robot 80 is sensed by respective sensors and/or estimated by respective estimators and fed back to the controller 104 and to the second unit 102, wherein based on the actual status X(t), the second unit 102 determines the performance Q(t) of the skill carried out by the robot 80, and wherein the learning unit 103 receives PD, and Q(t) from the second unit 102, determines updated skill parameters Pl(t) and provides Pl(t) to the second unit 102 to replace hitherto existing skill parameters Pl.
Number | Date | Country | Kind |
---|---|---|---|
10 2017 005 081.3 | May 2017 | DE | national |
The present application is the U.S. National Phase of PCT/EP2018/064059, filed on 29 May 2018, which claims priority to German Patent Application No. 10 2017 005 081.3, filed on 29 May 2017, the entire contents of which are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2018/064059 | 5/29/2018 | WO | 00 |