This application claims priority under 35 U.S.C. § 119 to patent application no. 10 2022 203 410.4, filed on Jun. 4, 2022 in Germany, the disclosure of which is incorporated herein by reference in its entirety.
The present disclosure relates to methods used for controlling a robotic device.
A robotic device, such as a robotic arm, that is intended to remove objects from a container should be able to adapt to various situations, e.g. starting conditions. For example, if an object is on the edge rather than in the center of a container, the robot should also be able to pick up the object. However, the robot can need to perform an additional action to this end, e.g. push the object to the center.
Approaches are therefore required to control a robotic device that enables successful and efficient control for various situations.
According to various embodiments, a method for controlling a robotic device is provided, comprising the following for each control vector from a plurality of control vectors: controlling the robotic device to perform a sequence of actions, wherein the control vector indicates which action is to be performed for a respectively observed control situation; and determining, by the sequence of actions, the values of multiple target metrics that evaluate the performance of a specified task. The method further comprises adjusting a probability distribution of the control vectors such that the probability of control vectors is increased for which the specified task was performed with high evaluations and the multiple target metrics have satisfied at least one target condition; randomly selecting a control vector according to the probability distribution for performing the task in a current control scenario; and controlling the robotic device to perform a sequence of actions, wherein the control vector indicates which action is to be performed for a respectively observed control situation.
The method described above allows a control of the robotic device that conforms to the circumstances in which it is deployed. Each control vector can be considered a representation of a control strategy, and control vectors (or similar control vectors) that result in successful control (in the sense of satisfying the at least one target condition and good results with respect to the target metrics) are increasingly preferred over time so that the control unit increasingly adjusts.
Various embodiment examples are specified hereinafter.
Embodiment example 1 is a method used for controlling a robotic device, as described above.
Embodiment example 2 is the method according to embodiment example 1, wherein the control vector indicates parameters of the actions for a control situation observed in each case for at least some of the actions.
Thus, the machine-learning model (represented by the probability distribution that is adjusted) not only learns good sequences of actions (or decisions that lead to a good sequence of actions in terms of the target metrics depending on the control scenario and the control situation) but also how the actions (in a control situation that has currently been observed) are to be parameterized. This in particular increases the degrees of freedom in the optimization of the control and thus allows for a better optimization result.
Embodiment example 3 is the method according to embodiment example 1 or 2, wherein the task comprises picking up an object from a container.
In particular, in such an application, the selection of actions taken is critical to how well the task is performed, for example because a decision must be made as to whether an object can be gripped right away or must first be moved. By adjusting the probability distribution based on successful control runs, the respective control unit that performs the method learns to adapt to different scenarios such as object types, grippers, and lighting situations and make appropriate decisions for them.
Embodiment example 4 is the method according to any one of embodiment examples 1 to 3, wherein the adjustment of the probability distribution is carried out by means of a gradient-free optimization process.
An example of this is CMA-ES (Covariance Matrix Adaptation Evolution Strategy). This allows for optimization with respect to multiple target metrics and taking into account the target conditions of the parameters of the machine-learning model (specifically the probability distribution), because no single real evaluation needs to be present (i.e. not a real value of a loss function, for example), but can also be optimized with respect to a vector-like target (values of multiple target metrics).
Embodiment example 5 is a robotic device control unit configured so as to carry out a method according to any one of embodiment examples 1 to 4.
Embodiment example 6 is computer program comprising commands that, when executed by a processor, cause said processor to perform a method according to any one of the embodiment examples 1 to 4.
Embodiment example 7 is a computer-readable medium which stores commands that, when executed by a processor, cause said processor to perform a method according to any one of the embodiment examples 1 to 4.
In the drawings, like reference numbers generally refer to the same parts throughout the different views. The drawings are not necessarily to scale, wherein emphasis is instead generally placed on representing the principles of the disclosure. In the following description, various aspects are described with reference to the following drawings.
The following detailed description relates to the accompanying drawings, which, for clarification, show specific details and aspects of this disclosure in which the disclosure can be implemented. Other aspects can be used, and structural, logical, and electrical changes can be made without departing from the scope of protection of the disclosure. The various aspects of this disclosure are not necessarily mutually exclusive, because some aspects of this disclosure can be combined with one or more other aspects of this disclosure to form new aspects.
Different examples will be described in further detail in the following.
The robot 100 comprises a robot arm 101, for example an industrial robot arm for handling or assembling a workpiece (or one or more other objects). The robotic arm 101 comprises manipulators 102, 103, 104 and a base (or support) 105, by means of which the manipulators 102, 103, 104 are supported. The term “manipulator” refers to the movable elements of the robotic arm 101, the actuation of which enables physical interaction with the environment, e.g., in order to perform a task. Regarding the control unit, the robot 100 comprises a (robot) control unit 106 configured so as to implement the interaction with the environment according to a control program. The last element 104 (farthest from the support 105) of the manipulators 102, 103, 104 is also referred to as the end effector 104 and can comprise one or more tools, e.g. a welding torch, a gripper, a paint tool, or the like.
The other manipulators 102, 103 (which are closer to the support 105) can form a positioning device so that, together with the end effector 104, the robotic arm 101 is provided with the end effector 104 at its end. The robotic arm 101 is a mechanical arm that can provide functions similar to those of a human arm (possibly with a tool at its end).
The robotic arm 101 can comprise articulation elements 107, 108, 109 connecting the manipulators 102, 103, 104 to one another and to the support 105. An articulation element 107, 108, 109 can comprise one or more articulation joints that can each provide rotary movement (i.e. rotational movement) and/or translational movement (i.e. displacement) for associated manipulators relative to one another. The movement of the manipulators 102, 103, 104 can be initiated by means of actuators controlled by the control unit 106.
The term “actuator” can be understood to mean a component that is designed to influence a mechanism or process in response to being driven. The actuator can convert instructions output by the control unit 106 (referred to as “activation”) into mechanical movements. The actuator, e.g. an electromechanical converter, can be designed to convert, upon its activation, electrical energy into mechanical energy.
The term “control unit” can be understood to mean any type of logic implemented by an entity including, e.g. a circuit and/or a processor capable of executing software, firmware, or a combination of both that is stored in a storage medium, and which can issue commands, e.g. to an actuator in the present example. For example, the control unit can be configured by means of a program code (e.g. software) in order to control the operation of a robotic device.
In the present example, the control unit 106 comprises one or more processors 110 and a memory 111 that stores code and data, based on which the processor 110 controls the robotic arm 101. According to various embodiments, the control unit 106 controls the robotic arm 101 on the basis of a machine-learning model 112 stored in the memory 111. For example, the robot 100 is intended to manipulate an object 113.
At any given time, the overall system composed of robotic arm 101 and object 113 (or other objects) to be manipulated has a particular state with respect to position, orientation, end effector state (gripper open/closed), etc. This state of the system, robot, or object is hereinafter referred to as the control configuration.
A manipulation task performed by, for example, a robotic arm 101 can typically be broken down into a sequence of skills. Such skills are also referred to as primitives or generally as actions. Primitives can be considered elementary skills of the particular robotic device.
It is assumed that for each primitive, the robot 100 has a control function that allows the control unit 106 to control the robotic arm to perform the primitive.
For example, for a primitive that corresponds to a skill in movement, a statistical model (e.g. a Hidden Semi Markov Model) can be trained by learning from demonstrations (LfD). Also for a primitive, such as determining a gripping pose for picking up an object, a model can be learned using machine-learning. For simpler primitives, such as capturing an image of the object by a camera 114, such a control function can also be programmed directly without training a model.
Each primitive is a configurable function that has an effect on the controlled system (in the example of
In the following, it is assumed that the manipulation task is to pick up an object 113, e.g. from a container 115 (so-called “bin picking”).
Examples of primitives associated with such a manipulation task are
A sequence of primitives for sorting multiple objects is, for example:
In one set of primitives, even for the same type of task (e.g. transporting objects from one container to another), there can be multiple sequences of primitives required for different scenes. For example, if the object is too close to the corner, it must be pushed out of the corner. Such sequences can be defined manually with fine-tuned rules as to which sequence to select in which scenario.
For example, the above sequence can be fixedly programmed in the control unit 106. However, for more complicated manipulation tasks, the resulting control is often not optimal. According to various embodiments, there is therefore provided a control method (and corresponding control unit) that optimizes the sequence of primitives and their parameters (in particular during the run time). The controlled technical system (robot 100 in the bin-picking example) is thereby enabled to adapt to different application cases, e.g. different object types, gripper types, etc., as well as different ambient conditions such as lighting (which affects primitives such as the gripping point detection).
Such an online optimization (optimization at run time, i.e. during operation) can be realized through black box optimization approaches. This includes Bayesian Optimization (BO) and CMA-ES (Covariance Matrix Adaptation Evolution Strategy). According to various embodiments, CMA-ES is used, because it works better on a large number of samples (i.e. such as results from previous control passes), which is an important aspect in continuously optimizing parameters during operation.
The control unit 201 (e.g. control unit 106) accesses a quantity of primitives 202 for which there are control functions (e.g. in control unit 106) according to an ML model 201 (e.g. ML model 112) trained for a target manipulation task. It selects a primitive 202 for each phase of the control unit in 203 and sets parameters in 204 for the selected primitive. The control unit 106 selects a primitive 202 only if all of its preconditions are met (if it has any). It can check this before selecting a primitive. In 205, the control unit 106 applies the selected and configured primitive to the respective controlled system 206 (e.g. the robotic arm 101 and, if applicable, other controlled elements, such as the camera 114) according to the parameters set. In 207, the controlled system 206 supplies sensor data to the control unit (in particular to the ML model 112), e.g. image data or also measurement data of a force sensor.
Each primitive 202 has an effect on the controlled system 206, e.g. it provides an image, the robotic arm 101 moves, the object 113 is gripped, etc. This effect can be sensed by sensors of the system (camera, robot sensor measurements, detection of a successful grasp, etc.) and is provided as feedback via the sensor data in 207 to the ML model 201. By collecting this feedback (i.e. the results that a primitive has with certain parameter values) and training based on this feedback, the ML model 201 can learn the effect of primitives (with associated parameters) and, when trained in this manner for a sufficiently long time, select the primitive and its parameter target for the highest benefit in any control situation (which it recognizes from the sensor data being fed back).
For example, the ML model 201 can learn
According to various embodiments, a control unit does not use a firmly programmed sequence of primitives, but instead trains a model that decides which primitive to execute next (and how to configure it, i.e. how to set its parameters). The model obtains the current state of the controlled system (including environment such as manipulated objects) as input (e.g. as one or more images, robotic state data, meta-information regarding the task) and outputs the next primitive to be performed and its parameters. In this way, the model can adjust the sequence of primitives to the state of the controlled system and perform the task more efficiently.
Between a starting state 301 and an end state 302, as primitives, a picking up 303, a movement 304, a reorientation (include at the current location) 305, a scanning (of the barcode) 306, and a dropping at the destination 307 can be performed.
As described above, the ML model 201 receives feedback on configured primitives (i.e. primitives with associated parameter values) that specifically enables it to assess how well a primitive or sequence of primitives is or has been capable of achieving the particular objective. According to various embodiments, a configured primitive or sequence of primitives is evaluated by means of multiple target metrics.
For example, a sequence of configured primitives is performed, and the force exerted by the robot on the object 113 is measured, each as part of the feedback. A target metric is, for example, the measured force (which should be as low as possible) and a condition with respect to this target metric can be that the force does not exceed a specified limit. A further target metric can be the deviation of the achieved destination. This can also provide a rating (as close to the target as possible) and a condition can also be specified for this purpose, e.g. the object can only have a certain highest distance from the destination.
Accordingly, for a control sequence, i.e. a sequence of (configured) primitives from starting state to end state, the control unit can determine values of target metrics, wherein each target metric has an associated optimization target (as low a force as possible, as close to the target) and can have a condition (force must remain below the barrier). Such a condition can or cannot be satisfied for a control flow.
Each sequence of primitives that is performed results from which primitives select the ML model 201 for a particular situation (i.e. a state of the controlled system). The ML model 201 makes these decisions according to a vector of values for a set of control parameters. This vector is also referred to as a control vector. The control vector specifies decision parameter values for decision conditions, for example, that the object is moved when it is closer than x mm to the edge of the container 115, i.e. the ML model 201 makes a particular decision based on the values of bits of the vector determined for that decision.
Thus, the control vector for an instance of the task, i.e. for the task in a particular control scenario (i.e. starting configuration and target configuration), defines a particular control flow (i.e. a sequence of primitives with configuration parameters of the primitives), because it defines the decisions as to which primitives (in response to feedback) are selected and their configuration.
The ML model samples the control vector from a space of control vectors. According to various embodiments, machine-learning model 201 learns from control sequences (where it receives evaluations or determines then from feedback) a probability distribution of the control vectors according to which it further takes samples from the space of control vectors. This is carried out, for example, in accordance with CMA-ES.
The probability distribution is adjusted over time (via multiple control sequences) such that control vectors that resulted in control sequences were met for the target conditions and for which the target metrics have assumed good (i.e. high or low depending on the target metric) values are more likely to be sampled.
The adjustment of the probability distribution is iterative, wherein a quantity of control vectors is evaluated for each iteration and then sorted. The sorting can account for the target metrics (e.g. by a weighted combination (e.g. sum) of the target metrics so that a value can be determined for each target vector and control vectors can be sorted based on these values) as well as control vectors that do not satisfy the target conditions (e.g. for at least one target metric outside of an allowable range), underweight, or out of the sorting. According to the sorting, a portion of the control vectors is then selected (e.g. the n best) and the probability distribution is adjusted in favor of these vectors. In the case of a normally distributed sampling, it includes the expected value of the distribution being shifted towards the mean of the vectors and the covariance matrix being adjusted according to the scattering. In addition, previous parameters of the probability distribution can be considered in order to ensure a more stable adjustment of the distribution.
In other words, the optimization of the target metrics is accomplished by learning the probability distribution, which becomes more and more likely in the course of operation control vectors that lead to good results (in terms of the target metrics).
This adjustment of the probability distribution can be performed in operation via control runs. From a certain number of control runs (or a further criterion, e.g. reliable control over a certain number of control runs), the training can be ended, i.e. the probability distribution can be left the same or even be adjusted more rarely.
Thus, over time, the ML model increasingly selects control vectors that will result in sequences of primitives for which the machine-learning model (according to its past experiences) expects to fulfill the conditions of the target metrics and to well fulfill the target metrics optimization targets.
In summary, a method is provided according to various embodiments, as shown in
In 401, for each control vector from a plurality of control vectors,
In 403, a probability distribution of the control vectors is adjusted such that the probability of control vectors is increased for which the specified task was performed with high evaluations and the multiple target metrics have satisfied at least one target condition.
In 404, a control vector is randomly selected according to the probability distribution for performing the task in a current control scenario.
In 405, the robotic device is controlled to perform a sequence of actions, wherein the control vector indicates which action is to be performed for a respectively observed control situation.
The method of
The approach shown in
Various embodiments can receive and use sensor signals from various sensors, such as video, radar, LiDAR, ultrasonics, movement, thermal imaging, etc., for example, in order to obtain sensor data regarding demonstrations or states of the system (robot and object or objects) and configurations and scenarios. The sensor data can be processed. This can include the classification of the sensor data or performing semantic segmentation on the sensor data, for example, in order to detect the presence of objects (in the environment in which the sensor data was obtained). Embodiments can be used in order to train a machine-learning system and to control a robot, e.g. autonomously by robot manipulators, in order to accomplish various manipulation tasks under various scenarios. The embodiments are especially applicable to the control and monitoring of the performance of manipulation tasks, e.g. in assembly lines.
For example, the robotic device is a robotic arm that can be used in order to pick up and, if necessary, inspect an object. In this case, the sensor data based on which the robotic arm is controlled comprises, for example, digital color images (RGB images) and depth images (if necessary, in combination, i.e. RGB+D images).
Although specific embodiments have been illustrated and described here, those skilled in the art in the field will recognize that the specific embodiments shown and described may be exchanged for a variety of alternative and/or equivalent implementations without departing from the scope of protection of the present disclosure. This application is intended to cover any adaptations or variations of the specific embodiments discussed here. Therefore, it is intended that the present disclosure be limited only by the disclosure and equivalents thereof.
Number | Date | Country | Kind |
---|---|---|---|
10 2022 203 410.4 | Apr 2022 | DE | national |