The present invention is directed to a system and method for grasp synthesis of non-occluded and occluded objects. Grasp synthesis refers to the process of determining the optimal way for a robotic manipulator to grasp an object. This involves calculating the appropriate contact points, grip force, gripper configurations, and gripper position and orientation with respect to the object needed to securely hold or manipulate the object. Effective grasp synthesis must take into account various factors such as the object's shape, size, weight, texture, and the surrounding environment.
Fully autonomous grasp synthesis still faces challenges in terms of adaptability and reliability, especially in dynamic and unstructured environments. This is where semi-automated approaches can offer significant advantages, combining the precision of automated systems with the intuition and problem-solving capabilities of human operators. In such systems, the cognitive load to the user and time required for execution is minimized through partial automation, while the user is presented with visualizations and the option to guide or intervene throughout the process.
The present invention pertains to a system and method for the development of grasp synthesis and grasp execution with a camera-equipped robot manipulator. The camera can contain grayscale, color, and/or color and depth modules and may include an inertial measurement unit (“IMU”).
In addition to the camera-equipped robot manipulator, the system may contain a device (or devices) to relay information to a human operator throughout the grasp synthesis and execution process. This device is used to present the user with information about the state of the system, object information, and synthesized grasp candidates. The system may further contain a device (or devices) to gather input from the human operator, which allows the operator to assist or intervene in the process of grasp synthesis and execution or take over control of the system entirely if required.
Other features and aspects of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, which illustrate, by way of example, the features in accordance with embodiments of the invention. The summary is not intended to limit the scope of the invention, which is defined solely by the claims attached hereto.
The various embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. Having thus described the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
For example, and not by way of limitation, one of the following methods may be used to specify the target object: through a single click on a display showing the image taken by the gripper camera—a segmentation mask of the object is then generated using either a deep learning segmentation model or a traditional segmentation method (as shown in
If the target object is surrounded by clutter, the user can instruct the system to “clear clutter” before proceeding with grasp synthesis. If instructed to clear clutter, the system will compute and execute an end effector path that makes contact with the target object and multiple objects surrounding it, displacing the target object and other objects that may be obstructing it (as shown in
The robot navigates the gripper camera such that the object is centered in the camera image in an autonomous manner. This can be done using 3D features from the depth camera or based purely on the feedback extracted from a gripper-mounted vision sensor.
For the best N2 grasps according to the quality metric, a sketched representation of each grasp is rendered onto the gripper camera image (for example, and not by way of limitation, with an opacity less than 100%) and displayed to the user (as shown in
If no promising grasps are identified by the system of the user, the system May autonomously restart the grasp synthesis process (including the generation of the 3D map) or change the position of the manipulator base in an automated or user-controlled fashion. If one or more promising grasps were identified, the system executes the selected grasp: the gripper is moved to the pre-grasp pose, and the gripper opening is set according to the planned grasp. The gripper is then moved from the pre-grasp pose to the grasp pose along the prescribed grasp approach axis according to one of the methods described herein.
Throughout the grasp approach and execution, the user is presented with a live camera field, updated renderings that explain how the selected grasp will interact with the target object, as well as a measure of applied gripping torque through a visualization and/or audible signal and can intervene in the grasp execution as they deem appropriate. The user can be informed if a grasp failure is detected—either through a visual or audible signal.
The grasp proposal methods available in the system include synthesis based on the target object's point clouds as well as methods that rely solely on input from the gripper camera's vision sensor, which may be grayscale or RGB. In one grasp proposal method that leverages a point cloud representation of the object, the system first computes the point cloud centroid. It then generates a gripper pose G that places the ‘grasp point’ at the point cloud centroid, then generates M1 poses {G_m1} by translating G in cartesian coordinates by a small distance (for example, and not by way of limitation, 0-1 cm), in random directions, then sample M2 gripper poses around each pose in {G_m1} by rotating the gripper in increments of 180 degrees/M2. This process is repeated M3 times, resulting in M1×M2×M3 poses for grasp candidates. This method is well-suited for thin-walled objects like mugs and open boxes, and thin objects like pens or cables, among other objects.
In another grasp proposal method that leverages point cloud representation of the object, the system fits a cuboid or cylinder to the point cloud of the target object, places the grasp point at the centroid of the cuboid or cylinder, orients the gripper such that the grasp axis aligns with the direction of the smallest dimension of the cuboid or cylinder, and sets the desired gripper opening angle to be the smallest object dimension+a small offset.
In another grasp proposal method that leverages a point cloud representation of the object, the system projects the 3D points of the target object onto the ‘background plane’ and then applies principal component analysis to the projected points to find the following two axes: the primary axis a1 explaining maximum variance, and the secondary axis a2 along with the residuals are minimized (where a1 and a2 lie within the background plane). The system then generates a grasp pose candidate for which the gripper's grasp axis aligns with a2.
In another grasp proposal method that leverages a point cloud representation of the object, a deep neural network is trained beforehand to map a point cloud to a set of 6DoF grasp poses. The neural network is used to infer suitable grasp candidates from the target object's point cloud.
In another grasp proposal method that leverages a point cloud representation of the object, an object recognition module (for example, a deep neural network or a point cloud-based object matching algorithm such as iterative closest point (“ICP”)) is used to associate the target object with an object maintained in a library of known objects. In this library, each of the known objects is linked with a set of suitable grasp poses.
In one grasp proposal method that leverages the gripper image of the object as well as the user input indicating the object, the image segmentation mask of the target object is obtained (for example, from the input method, or with a neural network trained for segmentation). Then principal component analysis is run on the segmentation mask pixels (in image plane) to identify the axis a1 explaining maximum variance, and the axis a2 along which the residuals are minimalized. The grasp pose candidate is oriented such that the grasp axis aligns with axis a2. This method is particularly useful for scenarios where we can identify the orientation of the supporting plane (such as a floor, table, door, wall, etc.) but have a hard time obtaining a good 3D representation of the target object. Reasons why obtaining a good 3D representation may not be possible include: the object of interest is too thin or too small and its dimensions are within the resolution of the depth camera (for example, the object of interest is a key, a sheet of paper or other material, a coin, a cable, etc.); or the object of interest has surface properties that make it difficult to obtain a good depth image (for example, the object of interest has a reflective surface, a transparent surface, etc.).
In another grasp proposal method that leverages the gripper image of the object, a deep neural network is trained beforehand to predict grasp poses from grayscale or RGB images and used to predict suitable grasp candidates.
In another grasp proposal method, the system relies on a demonstration from the user how to grasp a previously unknown object. In this scenario, the target object is presented to the system beforehand in a separate ‘teach step’. In this teaching step, the system then moves the gripper camera around the presented object to acquire a 3D representation of the object—this representation may include color information as well. The operator then demonstrates suitable grasp poses on the object. This information is then stored in memory, in a library of ‘known objects’—together with a descriptor of the object. During regular operation (i.e., after the teaching step), when a target object is encountered with a descriptor that matches an object in the system's library, the previously demonstrated grasps are proposed (accounting for the target object's pose). In another grasp proposal method, some or all of the aforementioned methods are combined.
In one of the grasp quality metrics implemented in the system, the system rewards if the grasp axis is perpendicular to the surface of the target object: given the surface normal of the target object (which may be estimated in a general preprocessing stem, or ad-hoc), the quality score/metric is obtained as follows: average of the dot product between the grasp axis and all surface normal; average taken over the area inside the grasp box.
Another metric can be obtained by assigning higher score to grasps for which the object surface normals align with the gripper finger surface normals when in contact with the object (score based on a volume-averaged dot product between the object surface normal and finger surface normals; averaged over the grasp box).
Another metric can be obtained by assigning a higher score to grasps closer to the object centroid. Here, the score is proportional to the distance between the object point cloud's centroid and the grasp axis.
A similar metric can be obtained by first using a neural network to estimate the target object's centroid from grayscale, RGB, and/or point cloud, and then assigning a higher score to grasps closer to the estimated centroid. Here, the score is proportional to the distance between the estimated centroid and the grasp axis.
Another metric can be obtained by assigning higher scores to grasps for which a larger part of the grasp box overlaps with the target object's volume.
Another metric can be obtained by assigning scores to grasps proportional to the number of target object point cloud points within the grasp box for each grasp. Another metric can be obtained by estimating the object's surface properties (specifically, coefficient of friction) across various parts of the object (estimation input: point cloud and/or image data); and then sum the coefficient of friction over across the part of the object's surface that falls inside the grasp box of a given grasp. Another metric can be obtained by a weighted combination of all the metrics mentioned above.
In one automated grasp approach method, the gripper moves from pre-grasp to grasp pose until one of the following termination strategies is met: in force-based grasp approach termination, the gripper is moved towards the grasp target pose until the external force on the gripper along the grasp approach axis exceeds a predetermined threshold FT; in pose-based grasp approach termination, the gripper is moved towards the grasp pose until the error in position is smaller than a predetermined threshold PT and the error in orientation is smaller than a predetermined threshold OT. Once the termination criterion is met, the gripper closes. In one semi-automated grasp approach method, the operator commands the gripper pose to move along the approach axis (through either joystick inputs, or a slider on a screen, or buttons, etc.) and confirms with a button click when the gripper should begin closing. The system of the present invention may further comprise at least one processor in communication with the robot manipulator, the user interaction device, and other components associated with the system including but not limited to a computer and a memory.
While various embodiments of the disclosed technology have been described above, it should be understood that they have been presented by way of example only, and not of limitation. Likewise, the various diagrams may depict an example architectural or other configuration for the disclosed technology, which is done to aid in understanding the features and functionality that may be included in the disclosed technology. The disclosed technology is not restricted to the illustrated example architectures or configurations, but the desired features may be implemented using a variety of alternative architectures and configurations. Indeed, it will be apparent to one of skill in the art how alternative functional, logical or physical partitioning and configurations may be implemented to implement the desired features of the technology disclosed herein. Also, a multitude of different constituent module names other than those depicted herein may be applied to the various partitions. Additionally, with regard to flow diagrams, operational descriptions and method claims, the order in which the steps are presented herein shall not mandate that various embodiments be implemented to perform the recited functionality in the same order unless the context dictates otherwise.
Although the disclosed technology is described above in terms of various exemplary embodiments and implementations, it should be understood that the various features, aspects and functionality described in one or more of the individual embodiments are not limited in their applicability to the particular embodiment with which they are described, but instead may be applied, alone or in various combinations, to one or more of the other embodiments of the disclosed technology, whether or not such embodiments are described and whether or not such features are presented as being a part of a described embodiment. Thus, the breadth and scope of the technology disclosed herein should not be limited by any of the above-described exemplary embodiments.
Terms and phrases used in this document, and variations thereof, unless otherwise expressly stated, should be construed as open ended as opposed to limiting. As examples of the foregoing: the term “including” should be read as meaning “including, without limitation” or the like; the term “example” is used to provide exemplary instances of the item in discussion, not an exhaustive or limiting list thereof; the terms “a” or “an” should be read as meaning “at least one,” “one or more” or the like; and adjectives such as “conventional,” “traditional,” “normal,” “standard,” “known” and terms of similar meaning should not be construed as limiting the item described to a given time period or to an item available as of a given time, but instead should be read to encompass conventional, traditional, normal, or standard technologies that may be available or known now or at any time in the future. Likewise, where this document refers to technologies that would be apparent or known to one of ordinary skill in the art, such technologies encompass those apparent or known to the skilled artisan now or at any time in the future.
This application claims the benefit of earlier filed U.S. Patent Application Ser. No. 63/533,378, filed Aug. 18, 2023, the entire contents of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
63533378 | Aug 2023 | US |