The present description relates to a method for controlling a robotic device and to robot control units.
To carry out a task by a robot, a user typically specifies a sequence plan, which may include the execution of various actions. Some of these actions may be primitive actions provided by the robot, which do not have to be adapted to the control situation, such as “open gripper.” Actions may, however, also be movement skills of a type, in which it is desirable for the robot to be able to adapt to the respective control situation. Thus, for example, the action “pick up the object” is a function of the initial position and of the initial orientation of the object.
For such skills, it is possible to train robot trajectory models. For example, a method is described in L. Schwenkel, M. Guo, and M. Bürger, “Optimizing sequences of probabilistic manipulation skills learned from demonstration,” in Conference on Robot Learning, 2019, (which is identified below as reference [1]), with the aid of which each movement skill is trained separately under various scenarios and is not bound to any specific task.
Approaches for controlling a robot (or, in general, a robotic device) are desirable, which enable an efficient implementation of a sequence plan, of both primitive actions as well as movement skills for which robot trajectory models have been learned.
According to various specific example embodiments of the present invention, a method is provided for controlling a robotic device, including providing demonstrations for carrying out each of multiple movement skills, training a robot trajectory model for each movement skill from the demonstrations, receiving a sequence plan for a task to be carried out by the robotic device, the sequence plan including a sequence to be carried out from the movement skills and primitive actions, ascertaining a reduced sequence plan from the sequence plan by omitting the primitive actions, generating a composite robot trajectory model from the robot trajectory models of the movement skills for the reduced sequence plan if two movement skills according to the reduced sequence plan are to be carried out in succession, cascading the robot trajectory models of the movement skills and controlling the robot according to the composite robot trajectory model, the control of the robot according to the composite robot trajectory model being interrupted after the movement skill is carried out and the one or multiple primitive actions being executed and the control of the robot according to the composite robot trajectory model being resumed, if after a movement skill according to the sequence plan one or multiple primitive actions is/are to be carried out before the next movement skill.
According to one further specific embodiment of the present invention, a robot control unit is provided, which is configured to carry out the above described method.
The method for controlling a robotic device and the robot control unit described above enables an efficient control of a robotic device for a task, for which the robotic device is intended to carry out movement skills learned from demonstrations as well as primitive actions. It is possible, in particular, to control the robotic device according to a composite robot trajectory model, even if the movement skills, of the learned robot trajectory models of which the composite robot trajectory model is made up, are interrupted by primitive actions.
Various exemplary embodiments are provided below.
Exemplary embodiment 1 is a method for controlling a robotic device as described above.
Exemplary embodiment 2 is a method according to exemplary embodiment 1, a state in the composite robot trajectory model, which is an end state of the robot trajectory model of the movement skill being identified, if according to the sequence plan one or multiple primitive actions is/are to be carried out after a movement skill, the control of the robot according to the composite robot trajectory model being interrupted if the state in the composite robot trajectory model during controlling of the robot has been reached, the one or multiple primitive action(s) being carried out and the control of the robot according to the composite robot trajectory model then being resumed.
The inclusion of the primitive actions based on the end states of the robot trajectory models of the various movement skills ensures that the primitive actions are carried out at the correct time in the control sequence. The characteristic of the primitive actions to not change the position of the robot also ensures that the subsequent movement skills according to the composite robot trajectory model are able to be executed.
Exemplary embodiment 3 is a method according to exemplary embodiment 1 or 2, there being for each movement skill of the robot trajectory model a hidden semi-Markov model including one or multiple initial state(s) and one or multiple end state(s), the method further including the training of a pre-condition model for each movement skill from the demonstrations, which includes for each initial state of the robot trajectory model of the movement skill a probability distribution of robot configurations before the execution of the movement skill, and of an end condition model for each movement skill, which includes for each end state of the robot trajectory model of the movement skill a probability distribution of robot configurations after the execution of the movement skill and the composite robot trajectory model being generated by
This enables an automatic correct composition of robot trajectory models of the movement skills.
Exemplary embodiment 4 is the method according to exemplary embodiment 1, each movement skill including a manipulation of one or of multiple object(s) by the robot and the hidden semi-Markov model of each movement skill being a task-parameterized hidden semi-Markov model including task parameters that correspond to a robot configuration and object configuration, in which the movement skill is applied.
Thus, various specific embodiments enable an efficient training and an efficient control of a robotic device for handling one or multiple object(s).
Exemplary embodiment 5 is the method according to exemplary embodiment 4, which further includes the training of an effect model for each movement skill, which includes for each end state of the robot trajectory model of the movement skill a probability distribution of how the robot configuration and/or object configuration is/are changed after the execution of the movement skill relative to an initial robot configuration and/or object configuration, to which the movement skill is applied, if two movement skills are to be carried out in a sequence in the task, the cascading of the robot trajectory model of the movement skill including the transformation of the task-parameterized hidden semi-Markov model of the second movement skill, so that its task parameters are task parameters that correspond to the robot configuration and/or object configuration, which is/are provided by the effect model of the first movement skill.
Thus, it may be ensured that a control trajectory over a sequence of movement skills is correctly calculated.
Exemplary embodiment 6 is the method according to one of exemplary embodiments 1 through 5, the generation of the composite robot trajectory model including the repeated cascading of robot trajectory models, cascaded robot trajectory models and combined robot trajectory models and the combining of robot trajectory models, cascaded robot trajectory models and a combined robot trajectory according to the task, so that the composite robot trajectory model is a trajectory model for the entire task.
As a result, a composite model for a complex task may be iteratively determined, which ensures robustness for a complex task and frees an operator from the need to define a complex hierarchy of branching conditions.
Exemplary embodiment 7 is a robot control unit, which is configured to carry out the method according to one of exemplary embodiments 1 through 6.
Exemplary embodiment 8 is a computer program including commands which, when they are executed by a processor, cause the processor to carry out a method according to exemplary embodiments 1 through 6.
Exemplary embodiment 9 is a computer-readable medium, which stores commands which, when they are executed by a processor, cause the processor to carry out a method as recited in one of exemplary embodiments 1 through 6.
In the figures, similar reference numerals relate in general to the same parts in all the different views. The figures are not necessarily true to scale, the emphasis instead being placed in general on the representation of the features of the present invention. Various aspects of the present invention are described in the following description with reference to the figures.
The following detailed description refers to the accompanying drawings, which show for the purpose of explanation specific details and aspects of this description, in which the present invention may be carried out. Other aspects may be used and structural, logical and electrical changes may be carried out without departing from the scope of protection of the present invention. The various aspects of this description are not necessary mutually exclusive, since several aspects of this description may be combined with one or multiple other aspect(s) of this description in order to form new aspects.
Various examples are described in greater detail below.
Robot 100 includes a robotic arm 101, for example, an industrial robotic arm for handling or mounting a workpiece (or one or multiple other object(s)). Robotic arm 101 includes manipulators 102, 103, 104 and a base (or support) 105, by which manipulators 102, 103, 104 are supported. The term “manipulator” refers to the moving elements of robotic arm 101, whose actuation allows for a physical interaction with the surroundings, for example, in order to carry out a task. For controlling, robot 100 includes a (robotic) control unit 106, which is configured to implement the interaction with the surroundings according to a control program. Final element 104 (furthest away from support 105) of manipulators 102, 103, 104 is also referred to as end effector 104 and may include one or multiple tool(s) such as, for example, a welding torch, a gripping tool, a painting tool or the like.
The other manipulators 102, 103 (closer to support 105) may form a positioning device, so that together with end effector 104, robotic arm 101 including end effector 104 at its end is provided. Robotic arm 101 is a mechanical arm, which is able to perform functions similar to a human arm (possibly with a tool at its end).
Robotic arm 101 may include articulated elements, 107, 108, 109, which connect manipulators 102, 103, 104 to one another and to support 105. An articulated element 107, 108, 109 may include one or multiple articulated joint(s), each of which is able to provide a rotational movement (i.e., rotation) and/or a translational movement (i.e., a displacement) of associated manipulators relative to one another. The movement of manipulators 102, 103, 104 may be initiated with the aid of actuators that are controlled by control unit 106.
The term “actuator” may be understood to be a component, which is designed, in response to being driven, for influencing a mechanism or process. The actuator is able to convert instructions output by control device 106 (the so-called activation) into mechanical movements. The actuator, for example, an electromechanical converter, may be configured to convert electrical energy into mechanical energy in response to its activation.
The term “control unit” may be understood to be any type of logic that implements an entity, which may, for example, include a circuit and/or a processor, which is able to execute software stored in a memory medium, firmware or a combination thereof, and which may output the instructions, for example, to an actuator in the present example. The control unit may, for example, be configured to control the operation of a robotic device by using program code (for example, software).
In the present example, control unit 106 includes one or multiple processor(s) 110 and a memory 111, which stores code and data, on the basis of which processor 110 controls robotic arm 101. According to various specific embodiments, control unit 106 controls robotic device 101 on the basis of a statistical model 112 stored in memory 111. Robot 100 is intended, for example, to manipulate object 114.
At any point in time, the entire system made up of robotic arm 101 and object 114 (or also further objects) to be manipulated is in a particular state with respect to position, orientation, end effector state (gripper open/closed) etc. This state of the system, of the robot or of an object is referred to below as (system/robot/object)-configuration, in order to avoid confusion with the states of the statistical model, which are passed through during controlling.
A robot 100 may utilize methods of learning from demonstration (LfD) in order to learn to carry out a task or to cooperate with a human partner. Human demonstrations may be coded by a probabilistic model 112 (also referred to as a statistical model), which represents the nominal plan of the task for the robot. Control unit 106 may subsequently use statistical model 112, which is also referred to as a robot trajectory model, in order to generate desired robotic movements, potentially as a function of the configuration of both the human partner as well as the surroundings.
The basic idea of LfD is to adapt a prescribed movement skill model such as, for example, GMMs to a set of demonstrations. M demonstrations should be present, of which each Tm contains data points for a data set of N=ΣmTm total observations ξ={ξt}t=1N, wherein ξt∈d. It is assumed that the same demonstrations are recorded from the perspective of P different coordinate systems (provided by the task parameters such as, for example, local coordinate systems or reference frameworks of objects of interest). One usual way of obtaining such data is to transform the demonstrations from a static global reference framework to a reference framework p by ξt(p)=A(p)−1(ξt−b(p). Here, {(b(p),A(p))}p=1P is the translation and rotation of (local) reference framework p with respect to a global coordinate system (i.e., the global reference framework). A TP-GMM is then described by model parameters {πk,{μk(p),Σk(p)}p=1P}k=1K, K representing the number of Gaussian components in the mixed model, πk being the previous probability of each component and {μk(p),Σk(p)}p=1P being the parameters of the k-th Gaussian component within reference framework p.
In contrast to the standard GMM, the above mixed model is not able to be learned separately for each reference framework. In fact, the mixed coefficients πk of all reference frameworks are divided and the k-th component in reference framework p must be mapped onto the corresponding k-th component in the global reference framework. Expectation maximization (EM) is an established method for learning such models.
Once it is learned, the TP-GMM may be used during the execution in order to reproduce a trajectory for the learned movement skill. Given observed reference frameworks {b(p),A(p)}p=1P, the learned TP-GMM is namely converted into a single GMM including parameters {πk,({circumflex over (μ)}k,{circumflex over (Σ)}k)}k=1K by multiplying the affinely transformed Gaussian components across various reference frameworks, as follows
the parameters of the updated Gaussian bell at each reference framework p being calculated as
{circumflex over (μ)}k(p)=A(p)μk(p)+b(p) and {circumflex over (Σ)}k(p)=A(p)Σk(p)A(p)T. Although the task parameters may vary over time, the time index is omitted due to the notation.
Hidden semi-Markov models (HSMMs) expand hidden standard Markov models (HMMs) by embedding pieces of time information of the underlying stochastic process. This means that whereas in HMM the underlying hidden process is assumed to be Markov, i.e., the probability of the transition to the next state is only a function of the instantaneous state, in HSMM, the state process is assumed to be semi-Markov. This means that a transition to the next state is a function of the instantaneous state as well as of the elapsed time since the state has been entered. They may be applied in combination with TP-GMMs for robot movement skills coding in order to learn spatial-temporal features of the demonstration. A task-parameterized HSMM model (TP-HSMM model) is defined as:
ahk being the transition probability from state h to k; (μkD,σkD)) describing the Gaussian distributions for the duration of state k, i.e., the probability that state k is retained for a particular number of successive steps; {πk,{μk(p),Σk(p)}p=1P}k=1K being the same as previously introduced TP-GMM, which represents the observation probability that corresponds to state k. It should be noted here that the number of states represents the number of Gaussian components in the “connected” TP-GMM.
Given a particular (partial) sequence of observed data points , it is to be assumed that the associated sequence of states in Θ is provided by st=s1s2 . . . st. The probability that data point ξt belongs to state k (i.e., st=k) is provided by forward variable αt(k)=p(st=k,{:
αt(k)=Στ=1t−1Σh=1Kαt−τ(h)ahk(τ|μkD,σkD)oτt, (2),
|{circumflex over (μ)}k,{circumflex over (Σ)}k) being the emission probability and ({circumflex over (μ)}k,{circumflex over (Σ)}k) of (1) being derived given the task parameters. Furthermore, the same forward variable may also be used during the reproduction in order to predict future steps up to Tm.
Since, however, in this case future observations are not available, only pieces of transition information and duration information are used, i.e. by setting |{circumflex over (μ)}k,{circumflex over (Σ)}k)=1 for all k and >t in (2). Finally, the sequence of the most probable states s*T
A desired final observation of the robot state should now be provided as ξT, T being the movement skill time horizon (for example, the average length over the demonstrations). Furthermore, the initial robot state is observed as ξ1. To execute the movement skill (i.e., the movement skill reproduction) given learned model Θa, only the most probable state sequence s*T is constructed given only ξ1 and ξT.
The reproduction using the forward variable may not directly take place in this case, since the forward variable in equation (2) calculates the sequence of marginally most probable states, whereas what is desired is the commonly most probable sequence of states given ξ1 and ξT. As a result, if (2) is used, there is no guarantee that the returned sequence s*T corresponds to both the spatial-temporal patterns of the demonstrations as well as to the final observation. With respect to one example for picking up an object, it may return a most probable sequence, which corresponds to “pick up from the side,” even if the desired end configuration is that the end effector is located at the upper side of the object.
To overcome this problem, a modification of the Viterbi algorithm is used according to one specific embodiment. The classic Viterbi algorithm may be used in order to find the most probable sequence of states (also referred to as Viterbi path) in HMMs, which result in a given flow of observed events. According to one specific embodiment, a method is used, which differs from the former in two main aspects: (a) it operates on HSMM instead of HMM; and more significantly (b) most observations aside from the first and the last are lacking. In the absence of observations, the Viterbi algorithm becomes, in particular,
pj(d)=(d|μjD,σjD) being the duration probability of state j, δt(j) being the probability that the system is in state j at time t and not in state j with t+1; and
({circumflex over (μ)}j,{circumflex over (Σ)}j) being global Gaussian component j in Θa of (1) given ξt. At any time t and for each state j, the two arguments, which maximize equation δt(j) are namely recorded and a simple backtracking procedure is used in order to find most probable state sequence s*T. In other words, the above algorithm derives the most probable sequence s*T for movement skill a, which results in final observation ξT starting from ξ1.
If the robot task space is represented by temporally varying locations (including position and orientation) of the end effector, classic euclidic-based methods are unsuitable for processing such data. Thus, according to various specific embodiments, the robot task space is furnished with a Riemannian manifold . For each point x, there exists in manifold a tangent space x. This enables euclidic operations to be carried out locally while they are geometrically consistent with manifold restrictions.
The exponential mapping and the logarithmic mapping may be used in order to map points between x and . Exponential mapping Expx:x→ maps a point in the tangent space of point x onto a point on the manifold, while the geodetic distance is maintained. The inverse operation is referred to as logarithmic mapping Logx:→x. Another useful operation is the parallel transport
which moves elements between tangent spaces without introducing a distortion. The exact form of the above-mentioned operations is a function of the Riemannian metric that is assigned to the manifold. According to various specific embodiments, Riemannian manifolds are used in order to correctly calculate statistics about using Riemannian normal distributions, which code the observed movement patterns and retrieve the control actions corresponding to the task schedule (for example, sequenced movement skills) using an optimal Riemannian control unit.
For the following explanations, a robotic arm 101 having multiple degrees of freedom is considered as an example, whose end effector 104 includes a state xe∈3×3×1 (which describes the Cartesian position, the orientation quaternion and the gripper configuration), which operates within a static and known operating space. There are also objects of interest within the range of arm 101, which are identified with O={o1,o2, . . . , oj}, each of which includes a configuration xo
In such a scenario, a human user carries out multiple kinesthetic demonstrations on the arm in order to manipulate one or multiple object(s) according to particular movement skills.
The set of demonstrated movement skills is to be identified with A={a1,a2, . . . , aH}. Furthermore, for the movement skill a∈A, the set of objects that is involved is indicated by Oa and the set of available demonstrations is identified with Da. It should be noted that all demonstrations follow the structure fixed on objects, which has been introduced above, i.e., they are recorded for multiple reference frameworks (i.e., from the perspective of multiple reference frameworks), which are normally assigned to the objects in Oa, which frequently represent the object position in the operating space. The movement skill “insert the pin into the cylinder” includes, for example, the objects “pin” and “cylinder” and the associated demonstrations are recorded both from the robot reference framework, the “pin” reference framework as well as the “cylinder” reference framework.
The (manipulation) task, which is considered below, is made up of a sequence of movement skills a*, which are selected from the demonstrated movement skills A. An insertion task, for example, entails “picking up the cap, reorienting the cap, picking up the cap again and inserting the cap.” At the end of the task, a target configuration G is achieved as the desired end configuration of the system, including the robot and the objects.
The usual way to organize manipulation tasks in the factory is via a diagram or a sequence plan. They are usually defined by drawing and storing in a GUI (graphic user interface).
The manipulation task is made up of a sequence of movement skills 201 through 209, each skill being either a movement skill a∈A learned from demonstrations as described above, i.e., being a movement skill, for which an HSMM has been learned, or a predefined primitive action (for example, a gripper command). For example, the manipulation task is a task for inserting a cap and includes
The actions close gripper 203, 207, open gripper 250, 209 and locate cap 201 are predefined primitive actions, which entail for example 6D pose estimations (locate cap 201) whereas the other actions are learned movement skills which, like above-described models, are learned from demonstrations.
The sequence plan may also include branchings. For example, movement skills “bring gripper into position from above to pick up an object,” “bring gripper into position from the left to pick up objects” or “bring gripper into position from the right to pick up objects” may be executed as a function of the initial configuration of the object. In this context, the sequence plan may indicate multiple sequences of movement skills and primitive actions to be carried out, which result from the various branchings (and thus from which branching conditions the respective control situation meets).
This means that the robot may have a set of movement skills (pre-programmed primitive actions or movement skills learned via demonstrations) pre-installed, and for a specific manipulation task, the operator manually constructs a diagram that describes the implementation of this task (for example, as in
Demonstrations of movement skills are carried out at 301.
The movement skills include at least those movement skills that are necessary for executing a task, which is provided by a sequence plan 303. Sequence plan 303, as described above with reference to
For a demonstrated movement skill a∈A as described above, the set of available demonstrations is provided by Da={ξt}t=1N, which are recorded in P reference framework. It should be noted that such reference frameworks are bound directly to the objects in Oa.
At 302, a robot trajectory model (also referred to as “robot behavior model”) is learned for each movement skill.
As described above, given a correctly selected number of components K, for example, TP-HSMM model Θa, which abstracts the spatial-temporal features of trajectories with respect to a movement skill a, may be learned using an algorithm of the EM type (expectation maximization).
In 304, a composite robot trajectory model is generated from the learned robot trajectory models learned in 302.
For this purpose, the movement skills (without the primitive actions) are drawn from the sequence plan in 304. In the example of
To be able to generate a composite robot trajectory model from the trajectory models of the various movement skills, the learning for each movement skill encompasses, in addition to the learning of the trajectory model, further the learning of a precondition model, of an end condition model and of an effect model. Using this model, a composite robot trajectory model 305 is constructed for the movement part (i.e., the sequence of the movement skills without the primitive actions) of the task established by sequence plan 303. Descriptively speaking, the precondition model encapsulates how the system should be before the execution of the movement skill, whereas the effect model and the end condition model encapsulate how the system should be changed after the execution of the movement skill. These models are an important part for calculating the composite robot trajectory model, since they measure the compatibility between movement skills and track the development of the system configuration (i.e., the configuration of the robot and of the objects involved). It should be noted that the term “movement skill model” may be understood in such a way that it includes the robot trajectory model, the precondition model, the end condition model and the effect model for the respective movement skill.
As described with reference to
Consequently, the generation of the composite model includes the recursive application of combination operations for combining movement skills in the sequence and of an operation for combining in parallel the movement skills.
The combination operation for cascading a sequence of movement skills, as represented in
For the combination of two movement skills that are to be executed in a sequence, the trajectory models of the two movement skills are cascaded in the following manner to form a composite trajectory model.
Given two TP-HSMMs Θa
It should be noted that the calculation and the update of lines 3 and 4 of algorithm 1 may be carried out according to equations (4) and (5), which are provided below.
A key finding may be considered to be that the same model Θa
According to one specific embodiment, a precondition model and an effect model as described in reference [1] are used. The learned precondition model, identified with Γ1,a, contains, in particular, TP-GMMs for the initial robot configurations (for example, initial position and/or location of the robot), i.e., Γ1,a={({circumflex over (μ)}1(p),{circumflex over (Σ)}1(p)), ∀p∈P1,a}, P1,a being the selected set of task parameters, which are derived from the initial system configuration (i.e., from the initial configuration (for example, position and/or location) of the robot and/or the objects). An end condition model is also introduced here, identified with ΓT,a, which is learned in a manner similar to Γ1,a, but for the end configuration, i.e., ΓT,a={({circumflex over (μ)}T(p),{circumflex over (Σ)}T(p)), ∀p∈PT,a}, PT,a being the selected set of reference frameworks derived from the end system configuration. Simply put, Γ1,a models the initial configuration before the execution of movement skill a, whereas ΓT,a models the end configuration thereafter. The learned effect model, identified with Γ1T,a, further contains TP-GMMs for the predicted end system configuration, i.e., Γ1T,a={{({circumflex over (μ)}1,o(p),{circumflex over (Σ)}1,o(p)), ∀p∈P1,a}, ∀o∈Oa∪e}, P1,a being defined in Γ1,a. These three models differ in that the task parameters for ΓT,a are calculated from the end system configuration (after the implementation of a), whereas those for Γ1,a and Γ1T,a are extracted from the initial system configuration (before the implementation of a). It is written Γa{Γ1,a,ΓT,a,Γ1T,a}.
The transition probability from an end component kf of Θa
KL(⋅∥⋅) being the KL Divergence (Kullback-Leibler-Divergence), ΓT,a
Given an end component kf of Θa
({circumflex over (μ)}k({circumflex over (p)}),{circumflex over (Σ)}k({circumflex over (p)}))(μk(p),Σk(p))⊗(bk
operation ⊗ being defined as the same operation of (1); (bk
Finally, as indicated in algorithm 1, other model parameters of {circumflex over (Θ)} such as, for example, duration probabilities, initial and end distributions are established in a trivial manner with slight changes of Θa
For the combination of two movement skills, which are to be executed in parallel, the trajectory models of the two movement skills are combined to form a composite trajectory model as follows.
Two TP-HSMMs Θa
Algorithm 2 is made up of two parts: one part in order to calculate composite TP-HSMM model {circumflex over (Θ)} and another part in order to calculate composite TPGMM model {circumflex over (Γ)}. The first and most important step is to update component indices of Θ2 by the total number of components in Θ1. This is to avoid multiple components of various movement skills including the same index. Subsequently, all of the associated TPGMM model, duration model, precondition model and effect model must be updated accordingly. Last but not least, when the composite transition matrix {akh} is calculated, {akh}1 of Θ1 and {akh}2 of Θ2 are diagonally added to {akh}, whereas the remaining entries are filled with zero. This means that no additional transitions of Θ1 and Θ2 are added, since they are composed in parallel (i.e., not in a sequence).
In summary, the generation of the composite model in 304 encompasses the repeated application of the following operations of two movement skills:
These two operations are, in particular, repeatedly carried out as follows:
When composite robot trajectory model 305 has been generated, the task for a given situation 306 (for example, initial configuration of the system, etc.) defined by sequence plan 303 may be executed.
For this purpose, the robot control unit considers the initial system configuration in given situation 306 and determines, for example, by applying equation (3) to composite robot trajectory model 305, the most probable sequence 307 of states (i.e., GMM components) within composite robot trajectory model 305, i.e., the state sequence, which brings the system (including the robot and the objects) with the highest probability into the target configuration.
The determination of state sequence 307 also provides the sequence of movement skills that have to be executed under given situation 306. This is significant if branchings in sequence plan 303 are present and the control unit must therefore select different branches.
From state sequence 307, the control unit ascertains robot movement trajectory 308 to be followed. It may, for example, use a linear quadratic tracking (LQT) in order to ascertain the optimal trajectory.
When control unit 106 controls robotic arm 101 according to robot movement trajectory 308, it installs, as indicated by arrow 309, the primitive actions, which are to be carried out in the process of controlling according to sequence plan 303.
For this purpose, when a primitive action according to a movement skill is to be carried out, the state in state sequence 307 is identified which is the end state of the movement skill, i.e., the end state of the robot trajectory model of the movement skill. If this state is reached, the control according to state sequence 307 is interrupted, the primitive action (or multiple primitive actions if multiple according to the movement skill are to be executed) is carried out and then the control according to the state sequence is resumed.
In summary, a method according to various specific embodiments is provided, as represented in
Demonstrations for carrying out each of multiple movement skills are provided in 601.
In 602, a robot trajectory model for each movement skill is trained from the demonstrations.
In 603, a sequence plan for a task to be carried out by the robotic device is received, the sequence plan including a sequence to be carried out made up of the movement skills and primitive actions.
In 604, a reduced sequence plan is generated from the sequence plan by omitting the primitive actions.
In 605, a composite robot trajectory model is generated from the robot trajectory models of the movement skills for the reduced sequence plan by, if two movement skills according to the reduced sequence plan are to be carried out in succession, cascading the robot trajectory models of the movement skills.
In 606, the robotic device is controlled according to the composite robot trajectory model, the control of the robot according to the composite robot trajectory model being interrupted after the implementation of the movement skill, if after one movement skill according to the sequence plan one or multiple primitive action(s) are to be executed before the next movement skill, and the one or multiple primitive action(s) are executed and the control of the robot according to the composite robot trajectory model is then resumed.
In other words, according to various specific embodiments, the movement skills are drawn from the sequence plan for a manipulation task, which is to be carried out by a robotic device, and for these a composite movement model is generated. The robotic device is then controlled based on the composite movement model, the actions being carried out at the correct points of the control.
The primitive actions may be understood to be basic actions, i.e., as action foundations, which the robotic device manages, i.e., for example, without these having to be learned from the demonstrations. There may, in particular, be actions, which do not need to be adapted to the respective control situation. Examples are the activation of the end effector (such as, for example, gripper open or gripper closed), the recognition of objects in the operating area of the robot or the changing of parameters of the robot control (for example, compliance). In addition, the primitive actions have the characteristic of not having to change the position and orientation of the end effector, so that the calculated robot trajectory subsequently continues to be valid and may be resumed.
According to various specific embodiments, models for a robotic device are trained for multiple movement skills, and when a task is to be executed which includes multiple executions of these movement skills in branches or in a sequence, the models are cascaded and/or combined to form a composite model. The composite model may then be used for controlling the robot as if it were a model for one single movement skill, i.e., for example, by determining an optimal state sequence for the task (and the initial configuration of the robotic device and objects, at which the task is to be executed) and by controlling the robotic device accordingly.
The method of
The term “robotic device” may be understood as relating to any physical system (including a mechanical part whose movement is controlled), such as, for example, a computer-controlled machine, a vehicle, a household appliance, a power tool, a manufacturing machine, a personal assistant or an access control system.
Various specific embodiments may receive and use sensor signals from various sensors such as, for example, video, radar, LIDAR, ultrasound, movement, heat imaging, etc., for example, in order to obtain sensor data with respect to demonstrations or states of the system (robotic device and object or objects) and configurations and scenarios. The sensor data may be processed. This may include the classification of the sensor data or carrying out a semantic segmentation on the sensor data, for example, in order to detect the presence of objects (in the surroundings in which the sensor data have been obtained). Specific embodiments may be used for training a machine learning system and for controlling a robotic device, for example, autonomously by robot manipulators, in order to achieve various manipulation tasks under different scenarios. Specific embodiments are applicable, in particular, to the control and supervision of the execution of manipulation tasks, for example, in assembly lines. They may, for example, be seamlessly integrated into a conventional GUI for a control process.
Although specific embodiments are represented and described here, those skilled in the art in the field will recognize that the specific embodiments shown and described may be exchanged for a variety of alternative and/or equivalent implementations, without departing from the scope of protection of the present invention. This application is intended to cover any adaptations or variations of the specific embodiments that are discussed herein.
Number | Date | Country | Kind |
---|---|---|---|
10 2020 214 231.9 | Nov 2020 | DE | national |
The present application claims the benefit under 35 U.S.C. § 119 of German Patent Application No. DE 102020214231.9 filed on Nov. 12, 2020, which is expressly incorporated herein by reference in its entirety.