The present application claims the benefit under 35 U.S.C. § 119 of German Patent Application No. DE 10 2023 207 208.4 filed on Jul. 27, 2023, which is expressly incorporated herein by reference in its entirety.
The present invention relates to methods for training a machine learning model for controlling a robot to manipulate an object.
Picking up (i.e., gripping) an object is an important problem in robotics. Recent research uses machine learning methods to make model-free gripping of unknown objects in unstructured environments possible. Many approaches focus on solving the gripping problem with three degrees of freedom (DoF) for parallel grippers and single suction grippers and output a gripping position along with a success metric. Although this simplifies the gripping problem, it requires assumptions about the orientation of the gripper, which, for example, leads to a restriction of the top-down gripping execution for parallel grippers (i.e., the approach direction) and additionally restricts the application of these approaches to specific gripper types.
Approaches to automatically controlling a robot to pick up (or generally manipulate) a robot that allow flexibility with regard to the approach direction of the gripper are therefore desirable.
The paper De Cao, N. and Aziz, W., “The power spherical distribution,” 2020, arXiv preprint arXiv: 2006.04437, hereinafter referred to as Reference 1, describes the power spherical distribution.
According to various embodiments of the present invention, a method for training a machine learning model for controlling a robot (to manipulate, in particular pick up, an object) is provided, comprising:
The set of training data elements is typically a batch of several training data elements (but, in the extreme case, a single training data element).
The method of the present invention described above makes it possible to train a machine learning model to densely predict gripping contact point pairs (e.g., together with values for gripper opening width) and gripping qualities for several gripping orientations (and thus approach directions) per contact point pair. The dense prediction of gripping points with several orientations (where appropriate, orientations and/or contact points that are close to one another, hence “dense”) makes it possible to more successfully grip objects even under additional constraints (e.g., reachability of the robot): The dense prediction of several orientations per contact point pair of the parallel grippers makes the approach very flexible in situations that are restricted to particular gripper orientations, e.g., when removing an object from a container with narrow gripping spaces and possible limitations of the kinematic reachability by the robot. Here, additional gripper orientations can make gripping objects possible despite these limitations and can increase the number of grippable objects in these scenarios.
With the approach of the present invention described above, gripping processes can also be predicted for objects that are not (completely) visible, which is important in scenarios with only one camera for parallel gripping (for example, when removing an object from a container): The prediction of contact point pairs is also possible if the contact points are not (both) visible image points, which makes gripping possible at points that are not visible in the camera image. This is particularly important in gripping scenarios with only one view, such as when removing an object from a container, where the camera view is typically restricted to a view from above due to the container geometry and only a single-view camera image, in which the gripping object is only partially captured from one or more sides depending on the object geometry, is thus available as an input for the machine learning model. However, for parallel grippers, it is necessary to predict contact points on two opposite sides of an object geometry, which usually cannot be captured from a single camera view. The prediction of gripping points even on non-visible parts of the object is therefore crucial in order to make it possible to successfully grip a large number of objects in such a single-view scenario with limited spatial accessibility (such as when removing objects from a container).
The prediction of a collision-free grip (gripping pose (including approach direction) plus gripper opening width) makes it possible to directly execute the predicted grip without the need for additional collision checks.
Various exemplary embodiments of the present invention are specified below.
Exemplary embodiment 1 is a method for training a machine learning model for controlling a robot, as described above.
Exemplary embodiment 2 is a method according to exemplary embodiment 1, wherein each training data element contains at least one direction vector between contact points (as ground truth) (e.g., two points corresponding to the two points at which the end effector grips the object in a respective grip), and wherein the loss furthermore contains a basis-vector loss component for at least one of the ascertained contact points per training data element, which basis-vector loss component decreases with increasing probability that a spherical distribution ascertained by the machine learning model for the basis vectors of an ascertained contact point matches the spherical distribution of the basis vectors assigned to an ascertained contact point and contained in the training data element.
The machine learning model is thus also trained to correctly predict the direction vector between the contact points (herein also referred to as the basis vector).
Exemplary embodiment 3 is a method according to exemplary embodiment 1 or 2, wherein each training data element comprises one or more contact points (as ground truth), and wherein the method comprises ascertaining, for each training data element and for each ascertained contact point, an associated partner contact point (i.e., a total of one contact point pair, in particular for gripping with a parallel gripper, wherein the partner contact point for a contact point results from the opening width in the direction of the basis vector from the contact point), and wherein the loss furthermore comprises a width loss component per training data element and per ascertained contact point, which width loss component decreases with decreasing distance of the ascertained associated partner contact point to the one or more partner contact points of the training data element.
The machine learning model is thus trained to correctly predict pairs of contact points.
Exemplary embodiment 4 is a method according to one of exemplary embodiments 1 to 3, wherein each training data element (as ground truth) comprises at least one contact-point quality rating, and the method comprises ascertaining, via the machine learning model, a quality rating for each ascertained contact point, and wherein the loss furthermore comprises a quality loss component per training data element and per ascertained contact point, which quality loss component increases with increasing difference between the quality rating ascertained for the ascertained contact point and a contact-point quality rating (e.g., quality ratings of one or more contact points neighboring the ascertained contact point, for which the training data element comprises a contact-point quality rating) that the training data element comprises for an associated contact point.
The machine learning model is thus trained to correctly predict qualities of contact points (and thus, for example, of grips).
Exemplary embodiment 5 is a method according to one of exemplary embodiments 1 to 4, wherein each training data element (as ground truth) comprises at least one (ground truth) position of a contact point on the surface of an object to be manipulated and at least one position of a reference point (e.g., gripping center) of the end effector, and the method comprises classifying, via the machine learning model, spatial regions into spatial regions with contact point and without contact point as well as with reference point and without reference point, and wherein the loss furthermore comprises, per training data element, a classification loss of the classification as a contact-point reference-point loss component. The classification may (as is usual for classification tasks) include the output of soft values, which are used to ascertain the classification loss. The machine learning model is thus trained to predict a suitable position and orientation of the end effector.
Exemplary embodiment 6 is a method for controlling a robot to manipulate an object to be manipulated, comprising:
In this way, the robot can be controlled without collision.
Exemplary embodiment 7 is a method according to exemplary embodiment 6, wherein the machine learning model is trained according to one of exemplary embodiments 1 to 5.
Exemplary embodiment 8 is a robot control apparatus configured to carry out a method according to one of exemplary embodiments 1 to 7.
Exemplary embodiment 9 is a computer program comprising instructions that, when executed by a processor, cause the processor to carry out a method according to one of exemplary embodiments 1 to 7.
Exemplary embodiment 10 is a computer-readable medium storing instructions that, when executed by a processor, cause the processor to carry out a method according to one of exemplary embodiments 1 to 7.
In the figures, similar reference signs generally refer to the same parts throughout the various views. The figures are not necessarily true to scale, with emphasis instead generally being placed on the representation of the principles of the present invention. In the following description, various aspects of the present invention are described with reference to the figures.
The following detailed description relates to the figures, which show, by way of explanation, specific details and aspects of this disclosure in which the present invention can be executed. Other aspects may be used and structural, logical, and electrical changes may be performed without departing from the scope of protection of the present invention. The various aspects of this disclosure are not necessarily mutually exclusive, since some aspects of this disclosure may be combined with one or more other aspects of this disclosure to form new aspects.
Various examples are described in more detail below.
The robot 100 includes a robot arm 101, for example an industrial robot arm for handling or assembling a work piece (or one or more other objects). This serves only as an example here, and the approach described below is not limited to the execution of a gripping process with a robot arm but can also be used for other robot kinematics (e.g., parallel kinematic robots). The robot arm 101 includes movable arm elements 102, 103, 104 and a base (or support) 105, which supports the arm elements 102, 103, 104. The term “movable arm elements” refers to the movable components of the robot arm 101, the actuation of which makes physical interaction with the environment possible, for example in order to perform a task. For control, the robot 100 includes a (robot) control device 106, which is designed to implement the interaction with the environment according to a control program.
The last arm element 104 (which is farthest away from the support 105) of the arm elements 102, 103, 104 is also referred to as the end effector 104 and may include one or more tools, such as a welding torch, a gripping tool, a painting device, or the like.
The other arm elements 102, 103 (located closer to the support 105) can form a positioning apparatus so that the robot arm 101 with the end effector 104 at its end is provided together with the end effector 104. The robot arm 101 is a mechanical arm (possibly with a tool at its end).
The robot arm 101 can include joint elements 107, 108, 109 which connect the arm elements 102, 103, 104 to one another and to the support 105. A joint element 107, 108, 109 can have one or more joints, which can each provide a rotatable movement (i.e., rotational movement) and/or translational movement (i.e., displacement) for associated arm elements relative to one another. The movement of the arm elements 102, 103, 104 can be initiated by means of actuators controlled by the control device 106.
The term “actuator” may be understood as a component that is designed to bring about a mechanism or process in response to its drive. The actuator can implement instructions (called activation) generated by the control device 106 as mechanical movements. In response to its activation, the actuator, for example an electromechanical converter, can be designed to convert electrical energy into mechanical energy.
The term “control device” can be understood as any type of logic-implementing entity (including one or more computers) that may, for example, include a circuit and/or processor that is capable of executing software, firmware or a combination thereof stored in a storage medium, and that can issue instructions, for example to an actuator in the present example. The control device can be configured for example by program code (e.g., software) to control the operation of a system, in the present example a robot.
In the present example, the control device 106 includes one or more processors 110 and a memory 111 that stores code and data, on the basis of which the processor 110 controls the robot arm 101. According to various embodiments, the control device 106 controls the robot arm 101 based on a machine learning model 112 stored in the memory 111.
According to various embodiments, the machine learning model 112 is designed and trained to make it possible for the robot 100 to recognize manipulation poses on one or more objects 113 where the robot 100 can pick up (or otherwise interact with, e.g., paint) the object(s) 113.
The robot 100 may, for example, be equipped with one or more cameras 114 that enable it to record images of its working space. The camera 114 is fastened for example to the robot arm 101 so that the robot can capture images of the object 113 from various perspectives by moving its robot arm 101. However, the camera 114 can also be fixedly mounted in a robot cell, as shown in
According to various embodiments, the machine learning model 112 is a neural network and the control device 106 supplies the neural network with input data on the basis of the one or more digital images (depth images with optional color images and intrinsic camera parameter values or a point cloud with optional color images) of an object 113, and the neural network ascertains suitable poses for the end effector 104 (hereinafter assumed to be a gripper by way of example; accordingly mentioned are “grips,” which the machine learning model ascertains). The machine learning model can also ascertain quality values for such grips. The quality value that the neural network outputs for a grip is, for example, a probability value that specifies an expected probability that gripping (or manipulating in general) with the respective grip will be successful. In addition, the machine learning model can also ascertain collision values for such grips. The collision value that the neural network outputs for a grip is, for example, a probability value that specifies an expected probability that gripping (or manipulating in general) with the respective gripper is possible without collision with other objects, the box or the environment. The probability values can be taken into account in subsequent processing in order ultimately to control the robot to carry out the gripping (or manipulation in general).
According to various embodiments, a procedure (in particular an architecture) is provided for the end-to-end training (and associated inference, i.e., prediction) of a machine learning model (e.g., the machine learning model 112) so that the machine learning model can predict grips (or their collision probability values and quality values) for an end effector 104, such as a gripper with six degrees of freedom (for the gripping pose), specifically a parallel gripper, wherein it considers several possible orientations.
For example, the machine learning model is trained such that it can densely predict contact point pairs (on the surface of the object to be gripped) together with the gripper opening width, a collision probability value and a gripping quality value for several gripper orientations per contact point pair on the basis of a point cloud for a gripping scenario (such as for removing objects from a container). In particular, the prediction of dense contact points and several gripping orientations per contact point pair provides a large solution space that makes it possible to find successful grips (gripping pose plus gripper opening width) even if additional constraints (such as reachability of the robot) prevent the execution of several grips.
The procedure described herein is not limited to a particular design of parallel grippers. The machine learning model is, for example, a neural network, such as a convolutional neural network, e.g., with a U-net architecture, and may also be a 3D convolutional neural network (3D-CNN) and/or have or contain a 3D U-net architecture. The input of the machine learning model consists, for example, of image data (for a gripping scenario, i.e., a control situation, in which an object is to be gripped), which are represented in the form of a 3D voxel grid (e.g., by representing the surface normal vectors per voxel).
The machine learning model contains, for example, a 3D convolutional neural network (e.g., with 3D U-net architecture) that maps each voxel feature (e.g., each normal vector per voxel) of a 3D voxel grid to an output of a 3D voxel grid with the same resolution, wherein, for each voxel, the probability is output as to whether the voxel includes a contact point, a reference point, or no contact point or reference point.
According to various embodiments, the machine learning model is trained by means of a loss function based on the power spherical distribution or a mixed power spherical distribution. However, it is also possible to use a loss function based on a different spherical distribution (e.g., the von Mises-Fisher distribution). Accordingly, the mixture distribution is not necessarily a mixed power spherical distribution but can also be a mixture of different spherical distributions.
The input 201 of the machine learning model 200 consists of a normal grid G, i.e., a 3D voxel grid (i.e., a division of the considered 3D space into voxels), in which each voxel contains a surface normal of the closest point on the surface of an object (i.e., for example, the object 113 to be gripped) to the voxel center. The normal grid is, for example, obtained by preprocessing from a 3D point cloud (e.g., ascertained from a camera image with depth information and intrinsic camera parameter values).
The normal grid G is first processed by a 3D convolutional neural network (3D-CNN) 202 of the machine learning model 200, which comprises several encoder layers, each of which can contain a gate function f (in addition to a convolutional layer). The gate function is optional and can be used to improve the performance in the case of a sparse contact point prediction, such as in gripping scenarios for removing an object from a container in which the objects to be gripped occupy only a small portion of the camera field of view. Here, the gate function serves as a weighting function of the output features of each layer in order to deactivate unimportant features.
The result of the 3D-CNN is a voxel grid GC. Each voxel of the grid is represented by a 3D vector (l1, l2, l3), where l1, l2, l3 are confidence values that a voxel is an “invalid point,” a “contact point,” or a “gripping center” (or corresponds to or contains such a point/center since it covers an entire spatial region). This marks each voxel as an “invalid point,” a “contact point,” or a “gripping center” (depending on which confidence value is the highest).
The gripper 300 grips the object 301 at two contact points, i.e., at a contact point pair (c, c′), which are spaced apart by the gripper opening width w (distance between the gripper fingers 302, 303 of the gripper) in the direction of a “basis” vector b. The orientation of the gripper 300 in the gripping pose with which it grips the object 301 by a vector a (which points in the direction in which the gripping fingers 302, 303 extend; it can be understood as an approach direction since the gripper is typically moved in this direction, at least shortly before gripping).
The gripper has a gripper center t and the point t′=c−la is referred to as the gripping center (or also reference point on the gripper) (where l is the position of the contact point c along the (here left) gripper finger that touches it).
Defining the gripping center t′ near the contact point makes it possible for the gripping center t′ and the contact points to be in the same receptive field, which supports the training of the 3D-CNN.
The machine learning model 200 then samples contact points PC from the voxels marked with “contact point” in GC, i.e., for each of several (where appropriate, all) of the voxels marked with “contact point,” it samples one or more contact points within the voxel (e.g., randomly within the voxel or by subdividing the voxel).
An MLP 203 then receives, as an input, multi-resolution features of the contact points that are obtained by concatenating features provided by the layers (for different resolutions) of the 3D-CNN network (i.e., from the feature maps output by the encoder layers of the 3D-CNN network). Since the encoder layers provide the features per voxel and not per contact point, interpolation (e.g., trilinear interpolation) for example is used to ascertain the feature values of the contact points.
The MLP 203 comprises several layers and the last layer of the MLP 203 outputs, for each of the sampled contact points, the parameters of a mixed power spherical (PS) distribution ν, κ and ω, the opening width of the gripper w, and the quality q. Here, ν (direction parameter) and κ (spread parameter) are the parameters of a power spherical distribution (as described in Reference 1 but denoted there by μ and κ). The power spherical distribution by ν and κ is (for the respective contact point and a respective approach direction a) a prediction (or estimate) of the machine learning model for the distribution of the basis vector b. The power spherical distribution can be understood as a kind of distribution of the basis vector b on a hypersphere in 3D space (as a counterpart to a Gaussian distribution). The parameter ν then specifies the mean (or optimally predicted) basis vector, and κ is a measure of the spread around this mean basis vector (the lower κ is, the more uncertain the prediction).
The vector ω comprises nr entries, where each entry ωi is the coefficient of a power spherical distribution in a mixed power spherical distribution. The mixed power spherical distribution is a distribution of the approach vector a, which can be located at one of nr angles (e.g., sampled from or defined in [0, π]) in the plane perpendicular to ν. The mixed power spherical distribution is a sum of power spherical distributions (one for each of the nr angles), which are each weighted by the respective ωi i.e., the sum thus weighted is the mixed power spherical distribution for the approach vector. Each of the power spherical distributions summed therein comprises, as a parameter, the approach direction corresponding to the respective angle and κ (see also the formula for the loss term La below).
Each weight ωi rates the permissibility of the respective approach vector (in the range [0;1]: in the case of certain collision ωi=0, certainly without collision ωi=1). Thus, ω is both a parameter of the mixed power spherical distribution (in the training) and at the same time a measure or a rating (in the inference) for the permissibility of a grip (collision-free or not; in the inference, the value is between 0 and 1 and specifies the probability of a collision-free grip, i.e., by comparison with a threshold value, it can be determined whether the grip is rated as a collision-free or collision-prone grip) for a particular angle (i.e., approach vector). The mixed power spherical distribution can be considered as a counterpart to a Gaussian mixture distribution for the 3D case.
The parameter q is the rating of the quality of the respective gripping contact point, i.e., a measure of how likely a stable grip is at this contact point. All nr grips have the same quality rating q since the stability of the grip does not depend on the gripping direction or the approach vector).
The approach can also be used with other spherical distributions (e.g., the von Mises-Fisher distribution).
In order to merge the information of the MLP 203 with the 3D-CNN 202 again and to improve the prediction for a respective contact point, the values for the output quality q and output values for the collision probability ωi are adapted to the location of the contact points within the respective voxel in a subsequent post-processing (PP) step. For this purpose, contact point interpolation scores are determined for the sampled contact points by means of trilinear interpolation and are multiplied by the respective quality values q of the associated contact points. In addition, the actual positions of the gripping centers are ascertained for the respective contact points and corresponding gripping center interpolation scores are determined by trilinear interpolation of the gripping center grid and are multiplied by the respective weighting factors ωi of the power spherical distribution, which represent the collision freedom for the different approach angles of a contact point.
The output 204 of the machine learning model 200 consists of the sampled contact points PC, the parameters ν, κ and ω, the opening width of the gripper w, and the quality q for each of the contact points as well as the information from GC as to which voxels were marked as “gripping center” and as “contact point.”
Each training data element 401 of the training data set contains a training input for a control situation, from which a normal grid G can be generated by preprocessing (e.g., from a depth image with intrinsic camera parameter values or a 3D point cloud), as well as ground truth data (i.e., labels or target outputs). The ground truth data contain (a number k of) possible grips for the control situation scene, each represented as a contact point {circumflex over (P)}C, a basis vector {circumflex over (b)}, an approach vector â, a gripper width ŵ, and a gripping quality {circumflex over (q)}. The training data can also contain gripper parameters, such as the finger length l and the gripper height d (e.g., from the gripper top side to the finger end).
The training data set can be generated e.g. by means of a simulation environment. This requires the simulation of the gripped objects, camera and additional components (e.g., a container when picking containers) as well as the selection of several gripping candidates, and the rating of their gripping quality (e.g., with a physics simulator).
The machine learning model 400 to be trained now operates as described with reference to
In order to calculate the loss contribution for each sampled contact point PC, the set N(PC) of contact points PC which are present within a radius r around Pc and are specified in the ground truth of the respective training data element is ascertained:
The sampled contact points are divided into two groups on the basis of the respective N(PC): Points in the set C have at least one ground truth neighbor (i.e., N(PC)>1). Points in the set
As explained with reference to
The distribution of the approach directions at each contact point is modeled as a mixed power spherical (PS) distribution, which is parameterized by T({circumflex over (ν)}), κ and ω. T is a transformation which generates nr approach directions T({circumflex over (ν)})j for {circumflex over (ν)} (according to the specified angular steps).
The contact point Pc belonging to a contact point pair (in addition to the contact point PC) (also called the partner contact point belonging to the contact point) can be calculated from {circumflex over (ν)} and PC and with the following equation:
As described with reference to
According to various embodiments, the loss for a training data element is the sum of five loss terms:
where α, β, γ and θ are the weights of the individual (loss terms) loss components.
These loss terms are described in more detail below.
For the training of the machine learning model 400, the parameters of the machine learning model 400, including the parameters of the 3D-CNN 202 and of the MLP 203, are updated by backpropagation (with respect to the loss L, e.g., summed or averaged over a batch of training data elements and the respectively sampled contact points).
The input 501 for the inference consists of a 3D point cloud (e.g., obtained from a depth image with intrinsic camera parameters are from a single viewing angle), which is transformed into the normal grid G.
The machine learning model 400 now operates as described with reference to
The output 502 of the machine learning model 500 consists of the contact points PC and the parameters of the mixed power spherical distribution ν, κ and ω as well as the gripper width w and the gripping quality q for each contact point.
By means of a transformation 503, the output 502 can be transformed into a 6-DoF robot gripping pose representation (which can be executed directly by a robot controller, for example). For this purpose, the gripper center t and the gripper orientation matrix R are needed. The relationship between the gripper center t and the contact point c is given by:
The orientation matrix R can be calculated from the approach vector a and the basis vector b as follows:
Here, d is the gripper height and b corresponds to ν, which is part of the output 502 of the machine learning model. As in the training, during the inference, the transformation T(⋅) is applied to each ν in order to generate nr approach vectors for each contact point. The entries of ω (i.e., the permissibility rating for each approach vector) can be converted into a binary rating by setting a threshold value, in order to filter out collision-free grips.
The final result 504 is a list of collision-free grips, which are each specified by a translation vector t, a rotation matrix R, a gripper width w, and the gripping quality q.
Alternatively, in addition to the grips output, the permissibility rating ωi for each grip (without conversion into a binary rating) can also be output in 504, e.g., in order to use the probability, represented thereby, of a collision-free grip in a downstream robot movement planning.
In summary, according to various embodiments, a method is provided as shown in
In 601, for each training data element of a set of training data elements, wherein each training data element comprises training input information (e.g., a point cloud or a depth image with intrinsic camera parameter values or information derived therefrom, such as a 3D voxel normal grid) about the location of surface points of a respective object and one or more possible approach directions of the robot, e.g., including the associated gripping quality (or the probability of a successful grip) for manipulating the object,
In 604, the machine learning model is to reduce a loss that contains, per training data element and per possible approach direction, an approach-direction loss component that decreases with increasing probability that the mixture distribution provides the possible approach direction, wherein the direction parameter (i.e., mean direction vector) of each of the spherical distributions is set according to the end-effector orientation angle assigned to the spherical distribution (i.e., the direction vector is set such that it corresponds to the assigned end-effector orientation angle and is located in a plane whose surface normal coincides with the direction of a basis vector that specifies the direction between two contact points (specifically when gripping) and can be determined as ground truth from the assigned basis vectors from the training data element). The training can take place according to a conventional procedure for training a machine learning model (i.e., for example, is a neural network or contains such a neural network (or also several such neural networks)), typically backpropagation.
The procedure described with reference to
The grips that the training data elements contain are each, for example, represented by a contact point (between object and gripping finger), basis vector (which describes the direction between two contact points of a grip), approach vector of the grip (which describes the approach direction perpendicular to the basis vector), gripper opening width (which describes the distance between two fingers of a parallel gripper), and the associated gripping quality (which describes the probability of a successful grip).
The spherical distribution at a contact point (e.g., for a basis vector, which specifies the direction between two contact points, or for an approach vector, which specifies a direction perpendicular to the basis vector when gripping) is for example determined by means of a set, assigned to the contact point, of contact points and the associated directions (e.g., basis vector and approach vectors) of a training data element from a ground truth data set).
The machine learning model also ascertains for example parameters of a spherical distribution (e.g., distribution parameter values ν and κ of a power spherical distribution) for the basis vectors (which describe the direction between two contact point pairs of a grip) for manipulating the object.
The loss may comprise further loss components (loss for basis vectors, approach vectors, gripper opening width, gripping quality, gripping contact point/and gripping center). The machine learning model can thus, for example, be trained to reduce a loss as a weighted sum of a loss for the basis vectors (described by the parameter values ν and κ of a power spherical distribution), a loss for the approach vectors (described by the values w of a mixture distribution of different power spherical distributions for different approach vectors per contact point), a loss for the gripper opening width (described by the distance of the contact points of a contact point pair), a loss for the gripping quality (described by the probability of a successful grip), and a loss for the contact points and gripping centers.
The method in
The method is therefore in particular computer-implemented according to various embodiments.
After the training, the machine learning model can be applied to sensor data which are determined by at least one sensor. For example, after the training, the machine learning model is used to generate a control signal for a robot by supplying it with sensor data with respect to the robot and/or its environment.
Various embodiments can receive and use time series of sensor data from various sensors, such as video, radar, lidar, ultrasound, motion, heat imaging, etc. Sensor data can be measured or also simulated for periods of time. Embodiments can be used to train a machine learning system and to control a robot, e.g., autonomously by robot manipulators, in order to achieve various manipulation tasks in different scenarios. In particular, embodiments are applicable to the control and monitoring of the execution of manipulation tasks, for example in assembly lines. They can, for example, be seamlessly integrated with a traditional GUI for a control process.
Number | Date | Country | Kind |
---|---|---|---|
10 2023 207 208.4 | Jul 2023 | DE | national |