The present application claims the benefit under 35 U.S.C. § 119 of German Patent Application No. DE 102020208169.7 filed on Jun. 30, 2020, which is expressly incorporated herein by reference in its entirety.
It is desirable to solve complex assembly tasks using robotic automation with robot skills that were manually shown to the robot via kinesthetic teaching.
This may be achieved by a method of and device for operating a machine according to example embodiments of the present invention.
In accordance with an example embodiment of the present invention, the method of operating the machine comprises providing a sequence of skills of the machine for executing a task, selecting a sequence of states from a plurality of sequences of states, depending on a likelihood, wherein the likelihood is determined depending on a transition probability from a final state of a first sub-sequence of states of the sequence of states for a first skill in the sequence of skills to an initial state of a second sub-sequence of states of the sequence of states for a second skill in the sequence of skills. An optimal state sequence is found this way based on a desired goal of the task and an initial state of the machine or environment.
In accordance with an example embodiment of the present invention, the method may comprise providing a first model for the first skill, wherein a parameter of the first model defines a trajectory in a state of the first sub-sequence of states or by providing a first model for the second skill, wherein a parameter of the first model defines a trajectory in a state of the second sub-sequence of states, the method further comprising determining for the state of the sequence of states the parameter, and determining a control signal for operating the machine according to the trajectory depending on the parameter. The trajectory is a reference for an optimal control of the machine.
In accordance with an example embodiment of the present invention, the method may comprise providing a second model comprising the plurality of sequences of states, determining the likelihood for at least one sequence of the plurality of sequences depending on the transition probabilities between the sub-sequences of states for the skills in the at least one sequence, selecting the sequence of states from the second model having a higher likelihood compared to at least one other sequence of states of the plurality of sequences of states. The second model comprises cascades of first models that are composed for determining the optimal sequence of states using a Viterbi algorithm.
In accordance with an example embodiment of the present invention, the method may comprise mapping a first observation for the machine to the first state, wherein the sequence of states starts at the first state and/or mapping a second observation to the second state wherein the sequence of states ends at the second state. This allows to determine the entire optimal sequence of states from two observations.
The first observation may characterize the machine or the environment of the machine before executing a first skill in the sequence of skills and/or wherein the second observation may characterize the machine or the environment of the machine after executing a last skill in the sequence of skills. This enables a simplified Viterbi algorithm based only on the initial and final observation in the task.
Providing the second model may comprise determining the transition probability from the final state of the first sub-sequence according to the first skill to a plurality of initial states of different second sub-sequences according to different instances of the second skill. This cascades different sub-sequences for the second skill with one sub-sequence of first skill.
The final state of the first sub-sequence may be determined depending on a final component of an instance of the first skill. The instance of the first skill may be a TP-HSMMs having multiple final components, i.e. different first sub-sequences. The final component defines the final state of one first sub-sequence and is hence linkable to the initial component of the initial state of the second sub-sequence.
At least one initial state of the second sub-sequence may be determined depending on an initial component of an instance of the second skill.
The transition probability may be determined depending on a divergence between a first Gaussian Mixture Model for the final component and a second Gaussian Mixture Model for the initial component.
The instance of the first skill and/or the instance of the second skill may be defined by a task-parameterized hidden semi Markov model.
In accordance with an example embodiment of the present invention, a device for operating a machine is adapted to perform the steps of the example methods.
Further advantageous embodiments of the present invention can be derived from the following description and the figures.
The machine 104 may be a robot. The controller 102 may be adapted to control the machine 104 depending on information about a state of the machine 104 or its environment. The environment may include at least one object that is manipulatable by the machine 104. The machine 104 and/or its environment is referred to as system herein.
One approach to solving complex assembly tasks is robotic automation with robot skills that were manually shown to the robot via kinesthetic teaching. Kinesthetic teaching is easy to perform by any worker who is familiar with an assembly process. Several demonstrations are collected for each unique robot skill, e.g., inserting a peg in a hole.
To capture the stochasticity of human demonstrations, the shown skills are encoded in statistical models, such as, Task-Parametrized Hidden Semi-Markov Models, TP-HSMMs.
According to the following description, a particular skill is encoded in a first model and sequencing is performed with different first models of different skills to determine a second model.
In the context of the description, model state refers to a Gaussian distribution of the TP-HSMM. System state refers to a robot state, for example, a position-orientation of its end-effector, or position and velocity of its end-effector. Observation refers to data used to train a skill, i.e. to train a TP-HSMM, or to reproduce a skill using a Viterbi Algorithm with an initial observation for the start of the skill and a target or goal observation for the end of the skill. These observations can correspond to the system state, or comprise it. Observations can include some additional variables that are specific for the problem.
The approach in accordance with an example embodiment of the present invention uses cascading of multiple first models, e.g., model states encoded in TP-HSMMs, together to build a larger second model corresponding to a multi-stage assembly procedure. For example, for a case of a first model for pick skill and a first model for a place skill, the second model for a pick-and-place skill is created that corresponds to a single TP-HSMM. The first model for a particular skill may encode multi-modal trajectory distributions for this particular skill to ensure generality and flexibility. For example, a pick skill may perform an object picking approach from a side, a top, a bottom of an object.
According to this approach, an amount of possible trajectory distribution modes are encoded in the cascaded, i.e., sequenced, second model. The number of possible trajectory distributions grows exponentially with a number of skills in the cascade.
According to this approach, it is determined which trajectory distribution mode the machine 104 should follow to fulfil a goal of the task. A framework will be described below that is able to find the most likely skill sequence instantiation that fulfils the goal.
According to an example, a cascaded model for pick-and-place is provided. Given the goal as a goal location for placing an object, a forward pass and a backward pass of the second model finds a state sequence that most likely fulfils the goal compared to other state sequences that can be found according to the second model.
In an example an orientation of an end-effector state of a robot is encoded and represented as unit quaternion. The quaternion belongs to a S3 hypersphere, which can be described as a Riemannian manifold.
In the example, the state is defined in a Riemannian manifold R=3×S3×1. The modeling technique TP-HSMM is adapted to handle variables in Riemannian manifolds.
The controller 102 may be for linear quadratic tracking to solve a control problem, e.g. retrieve an optimal trajectory for a state sequence depending on information 108 about the state sequence. The controller 102 may be adapted according to a scalable finite-horizon linear quadratic tracking control algorithm for Riemannian manifolds.
The controller 102 may be an optimal Riemannian controller: in optimal linear quadratic tracking the dynamics is defined to be linear in states and actions. The cost function to minimize is quadratic in states and actions. Given the initial state of the system the complete state sequence is defined as a function of control actions at each time step. Furthermore, states and control actions can be concatenated at every time step into two large column vectors, and their dynamics relationship into a large quadratic matrix. A quadratic cost is defined as a linear function of the control actions. Finally, the optimal control actions can be obtained by a matrix inversion.
However, as matrix inversion scales cubically with the size of the matrix. The size is in the exemplary case the number of time steps times the dimensionality of the states. This solution is derived purely from linear algebraic steps and allows to compute the optimal control and state sequence in one batch.
Instead of solving the control problem in a batch form, the controller 102 may be implemented as an optimal feedforward and feedback controller recursively solving the control problem. The Bellman equation may be used as described below to solve the control problem using a cost-to-go function. Accordingly, an optimal action in a state decreases its cost-to-go function in the steepest direction. By exploiting this property of the cost-to-go function the optimal control actions are found recursively. In this example, the recursive formulation described below is used for Riemannian manifolds. This solution is less expensive to compute.
In one aspect the controller 102 is adapted to control the machine 104 according to a reference trajectory corresponding to a state sequence provided depending on the cost-to-go function. The Viterbi algorithm may be used to determine the state sequence.
The optimal control solution is defined depending on the description of the system A, B for example as:
P
t={tilde over (Σ)}k
where
and wherein
defines a parallel transport operation between xt and {circumflex over (μ)}k
wherein R defines parameters of the control.
For the optimal control solution, the properties of A and B are basically dictated, generally speaking, by the system dynamics model. The R matrix is usually manually designed.
The information 108 may be provided to the controller 102 by an apparatus 110 for planning states for the operation of the machine 104.
The apparatus 110 and the controller 102 are adapted to execute steps in the method described below for controlling the machine 104. The at least on processor 112 may be adapted to execute instructions stored in the at least one memory 114 to perform steps of the method to determine the information 108 for the at least one output 120 from at least one observation 118 received at the at least one input 116.
An observation 118 may be defined by a data point defining a state of the system. In a training, a plurality of observations 118 for performing a skill may be captured by at least one sensor from a human demonstration. Preferably, a spatial-temporal sequence of observations is captured while the human demonstrates the skill.
In the example the machine 104 is adapted to perform a plurality of different skills.
A set of demonstrations 122 may be recorded for the plurality of different skills.
Spatial coordinates of one spatial-temporal sequence of observations are in the example defined relative to a coordinate system having an origin assigned to one object. The spatial-temporal sequence for exercising a specific skill with a specific object may be recorded from a perspective of specific coordinate system and transformed to a frame. In the example, the set of demonstrations 122 comprises a plurality of spatial-temporal sequences of observations recorded from a plurality of human demonstration from different perspectives that are transformed in frames that are assigned to different objects. The demonstration in the example is a human demonstration. The demonstration may be a demonstration by a training device as well.
The apparatus 110 is adapted to determine a plurality of first models 124 depending on the set of demonstrations 122. In
The apparatus 110 may be adapted to determine a TP-HSMM depending on a number of Gaussian components of a Gaussian Mixture Model, GMM.
The apparatus 110 is adapted to determine at least one task parameter 126-1 for the model 124-1, at least one task parameter 126-2 for the model 124-2 and at least one task parameter 126-3 for the model 124-3. The at least one task parameter 126-1 may comprise model parameters 126-11 for a state before executing the first skill (precondition model) and model parameters 126-12 for a prediction of a state after executing the first skill (predicted-final-state model). The apparatus 110 is adapted to determine these parameters depending on at least one frame determined by at least one observation for the first state of the system before executing the first skill. The at least one state parameter 126-1 may comprise further model parameters 126-13 for the state after executing the first skill (final-condition model). These may be implemented as GMMs. The apparatus 110 may be adapted to determine the further model parameters 126-13 depending on at least one frame determined by at least one observation for the state the system after executing the first skill. The apparatus 110 is in the example adapted to determine these model parameters for the model 124-2 and 124-3 as well.
The apparatus 110 is adapted to determine the second model 128 depending on a sequence of skills 130 for performing a task 132. The task 132 may comprise manipulating an object with the machine 104. The task 132 is in the example provided by a user of the machine 104 by user input via the user interface.
For the exemplary task 132, the first skill 122-1, the second skill 122-2 and the third skill 122-3 may be executed in this order for performing the task 132. The apparatus 110 is in this case adapted to map the task 132 to this order of the three skills.
In the example, the apparatus 110 is adapted to determine a cascade of an instance 128-1 of the model 124-1 with a first instance 128-2 for the model 124-2 and a second instance 128-3 for the model 124-2.
The apparatus 110 is adapted to update the second model 128 according to the instance 128-1 of the model 126-1 and the first instance 128-2 of the model 126-2. The apparatus 110 is adapted to update the second model 128 according to the instance 128-1 of the model 126-1 and the second instance 128-3 of the model 124-3. More specifically, the second model 128 may be updated according to a first cascade comprising the instance 128-1 and the first instance 128-2 and according to a second cascade comprising the instance 128-1 and the second instance 128-2.
The second model 128 comprises in the example at least one sequence according to the first cascade and at least one sequence according to the second cascade The first cascade comprises in the example different instances of the model 124-3 for a plurality of components of the model 124-2 and the second cascade comprises in the example different instances of the model 124-3 for a plurality of components of the model 124-2.
This means that for different components, the same sequence of skills may result in a plurality of different cascades. The second model 128 comprises a plurality of sequences of states.
The apparatus 110 is adapted to determine an optimal sequence of states ŝ* for the control of the machine 104 depending on a sequence of model states in the second model 128. The apparatus 110 in one aspect is adapted to apply a Viterbi algorithm to determine which sequence of model states in the second model 128 to choose as the optimal sequence of states ŝ*.
A method to determine the optimal sequence of states ŝ* is described below. A sequence of model states from the model 128 is considered optimal for example when it has assigned to it a highest likelihood of all likelihoods assigned to the available sequences of model states in the second model 128. It may be sufficient for the sequence of model states to have a higher likelihood than another sequence of model states for it to be useful. Hence the idea is not limited to the optimal sequence of states ŝ*.
The input to the method is a sequence of skills a*, an initial state x0, a target or goal state xG, a plurality of first models Θa for the skills a in the sequence of skills a* and their respective model parameters Γa.
The method comprises computing the second model 128, referred to as {circumflex over (Θ)}a* below, depending on the sequence a* of skills a1, a2, . . . by cascading TP_HSMMs for the skills a1, a2, . . . , in their subsequent order in the sequence a* starting with a first skill a1 and ending with a last skill of the sequence.
The example below describes the cascade of a TP-HSMM for the first skill Θa
The second model {circumflex over (θ)}a* may include all possible cascades.
The TP-HSMM for the first skill Θa
Θa
where KL is a Kullback-Leibler divergence as described in J. R. Hershey and P. A. Olsen, “Approximating the Kullback Leibler divergence between Gaussian mixture models,” in Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 4. IEEE, 2007, pp. IV-317; where ΓT,a
({circumflex over (μ)}k({circumflex over (p)}),{circumflex over (Σ)}k({circumflex over (p)}))μk(p),Σk(p)⊗bk
where ⊗ is defined as
where the parameters of the updated Gaussian at each frame p are computed as:
{circumflex over (μ)}k(p)=A(p)μk(p)+b(p)
and
{circumflex over (Σ)}k(p)=A(p)Σk(p)A(p)
{circumflex over (Γ)}={{circumflex over (Γ)}1,{circumflex over (Γ)}T,{circumflex over (Γ)}1T}={Γ1,a
The method further comprises computing the optimal sequence of states, ŝ* below, by:
a) determining for all cascades of the second model {circumflex over (Θ)}a* depending on a given initial observation ξ0 and a given final observation ξT for the task that the optimal sequence shall perform, for each skill a in the cascade a likelihood
where dt(j) is the likelihood of being in model state j at step or time t and not being in model state j at step or time t+1 and
where ({circumflex over (μ)}j, Σj) is a global Gaussian component for the model state j in the model Θa for the skill a, and wherein model state j is a state of a sequence of model states in the second model Gaussian component.
In the example, the two arguments d and i that maximize the likelihood are recorded for the model state j in the cascade.
The Viterbi algorithm is used to compute the most-likely sequence of states using a transition matrix and emission probabilities. The transition matrix is used as input to the forward pass of the Viterbi algorithm. Given a skill represented by an HSMM, the initial observation and a final observation, the two equations above are used to compute the most-likely sequence of HSMM states. This sequence of HSMM states corresponds to the TP-GMM Gaussians. This sequence of state is given for each time step of a time horizon. The time horizon can specifically be given e.g. by an expert. That is, at each time step of the time horizon there is a single TP-HSMM assigned as the most likely one.
b) determine in the second model {circumflex over (Θ)}a* the final model state j of the last skill in the sequence a* of skills a1, a2, . . . that has the highest likelihood.
c) determine a sequence of model states to the first state selecting preceding model states j of the final model state j of the last skill in the sequence a* of skills a1, a2, . . . that has the highest likelihood. In the example, the Viterbi algorithm is used to evaluate the likelihoods of a plurality of sequences of model states in the second model 128. In one aspect the likelihoods are evaluated as the observation probability from the initial state of the goal state via the model states in the cascaded model. The sequence of model states that has the highest likelihood is then chosen as the optimal sequence.
The method further comprises controlling the machine 104 for executing each skill ah of the sequence a* of skills a1, a2, . . . by
a) Observing a current system state xh,0
b) Updating the parameters of the corresponding first model, i.e. the global Gaussian components ({circumflex over (μ)}j, {circumflex over (Σ)}j) for the model states j of a sub-sequence ŝh* of the optimal sequence ŝ* starting at the given current system state xh,0. This means that the sub-sequence ŝh* is the optimal sequence of model states for the skill ah.
c) Tracking the trajectory provided by the global Gaussian components ({circumflex over (μ)}i, Σj) for the model states j of the sequence ŝh*. This means, that the trajectory of the optimal sub-sequence ŝh* for the skill ah is tracked until the final model state for the skill ah.
Starting from the first skill a1 the latter steps a) to c) may be repeated until the goal of the task is achieved, i.e. until the last model state of the last skill in the sequence of a* has been executed.
An exemplary method of operating the machine 104 in accordance with the present invention is described below with reference to
The method comprises a step 502.
In step 502 a sequence a* of skills 132 of the machine 104 is provided for executing a task
Afterwards a step 504 is executed.
Step 504 may comprise providing the TP-HSMM for the first skill. Step 504 may comprise providing the TP-HSMM for the second skill.
The parameters (μj, {circumflex over (Σ)}j) of the TP-HSMM define a trajectory in a model state j of respective sub-sequence of model states for the respective skill.
Afterwards a step 506 is executed.
In Step 506 the second model 128 is provided. The second model 128 comprises the plurality of sequences of model states.
Determining the second model 128 may comprise determining the transition probability ak
The final model state of the first sub-sequence may be determined depending on a final component kf∈Θa
The at least one initial model state of the second sub-sequence may be determined depending on an initial component ki∈Θa
The transition probability ak
The instance of the first skill and/or the instance of the second skill may be defined by a respective TP-HSMM.
The method may comprise in a step 508 mapping a first observation ξ0 for the machine 104 to the first system state x0. The optimal sequence of states ŝ* may start at the first system state x0 matching the first observation G.
The method may comprise in a step 510 providing the second system state xG. The second system state xG may be a mapping of a predetermined second observation ξT to the second state xG. The optimal sequence of states ŝ* ends at the second system state xG.
The first observation ξ0 may characterize the machine 104 or the environment of the machine 104 before executing a first skill in the sequence a* of skills 132. The second observation ξT may characterize the machine 104 or the environment of the machine 104 after executing a last skill in the sequence a* of skills 132.
Afterwards a step 512 is executed.
Step 512 comprises selecting the optimal sequence of states ŝ* from the plurality of sequences of model states of the second model 128, depending on the likelihood for the respective sequence of model states.
The likelihood is determined depending on the transition probability as described above.
This means that the likelihood for at least one sequence of the plurality of sequences is determined.
The sequence of model states that is selected from the second model 128 has a higher likelihood compared to at least one other sequence of model states of the plurality of sequences of states.
Afterwards a step 514 is executed.
In Step 514, the parameters ({circumflex over (μ)}j, {circumflex over (Σ)}j) for a current model state j of the optimal sequence of states ŝ* are determined.
The parameters ({circumflex over (μ)}j, {circumflex over (Σ)}j) may be determined as the parameters ({circumflex over (μ)}j, {circumflex over (Σ)}j) of the TP-HSMM defining the trajectory of the model state j of respective sub-sequence of states for the respective skill.
Afterwards a step 516 is executed.
Step 516 comprises determining a control signal û* for operating the machine 104 according to the trajectory depending on the parameter ({circumflex over (μ)}j, {circumflex over (Σ)}j). In the example, the optimal control solution is determined as described above recursively.
Afterwards, the step 514 is executed for a next state in the sequence of states until the final state j of the last skill in the sequence a* of skills is reached.
Number | Date | Country | Kind |
---|---|---|---|
102020208169.7 | Jun 2020 | DE | national |