The present disclosure relates to the field of industrial robot programming and, more particularly, to a method for programming a robot by human demonstration to perform a workpiece pick, move and place operation, including vision and force sensors to track both the human hand and the workpiece, where the method uses the sensor inputs for teaching both motions and state change logic, and generates robot programming commands from the motions and the state changes.
The use of industrial robots to repeatedly perform a wide range of manufacturing, assembly and material movement operations is well known. However, teaching a robot to perform even a fairly simple operation—such as picking up a workpiece in a random position and orientation on a conveyor and moving the workpiece to a shipping container—has been unintuitive, time-consuming and/or costly using conventional methods.
Robots have traditionally been taught to perform pick and place operations of the type described above by a human operator using a teach pendant. The teach pendant is used by the operator to instruct the robot to make incremental moves—such as “jog in the X-direction” or “rotate gripper about local Z-axis”—until the robot and it's gripper are in the correct position and orientation to grasp the workpiece. Then the robot configuration and the workpiece position and pose are recorded by the robot controller to be used for the “pick” operation. Similar teach pendant commands are then used to define the “move” and “place” operations. However, the use of a teach pendant for programming a robot is often found to be unintuitive, error-prone and time-consuming, especially to non-expert operators.
Another known technique of teaching a robot to perform a pick and place operation is the use of a motion capture system. A motion capture system consists of multiple cameras arrayed around a work cell to record positions and orientations of a human operator and a workpiece as the operator manipulates the workpiece. The operator and/or the workpiece may have uniquely recognizable marker dots affixed in order to more precisely detect key locations on the operator and the workpiece in the camera images as the operation is performed. However, motion capture systems of this type are costly, and are difficult and time-consuming to set up and configure precisely so that the recorded positions are accurate.
Other existing systems which teach robot programming from human demonstration exhibit various limitations and disadvantages. One such system requires the use of a special glove fitted with sensors to determine hand and workpiece actions. Other systems visually track either the hand or the workpiece, but have difficulty determining accurate gripper commands due to visual occlusion of the hand by the workpiece, inability to decipher hand velocity transitions, or for other reasons.
In light of the circumstances described above, there is a need for an improved robot teaching technique which is simple and intuitive for a human operator to perform, and which reliably captures motions and actions such as gripping and ungripping.
In accordance with the teachings of the present disclosure, a method for teaching a robot to perform an operation based on human demonstration using force and vision sensors is described and illustrated. The method includes a vision sensor to detect position and pose of the human's hand and optionally a workpiece during teaching of an operation such as pick, move and place. The force sensor, located either beneath the workpiece or on a tool, is used to detect force information. Data from the vision and force sensors, along with other optional inputs, are used to teach both motions and state change logic for the operation being taught. Several techniques are disclosed for determining state change logic, such as the transition from approaching to grasping. Techniques for improving motion programming to remove extraneous motions by the hand are also disclosed. Robot programming commands are then generated from the hand positions and orientations, along with the state transitions.
Additional features of the presently disclosed systems and methods will become apparent from the following description and appended claims, taken in conjunction with the accompanying drawings.
The following discussion of the embodiments of the disclosure directed to robot program generation by human demonstration is merely exemplary in nature, and is in no way intended to limit the disclosed devices and techniques or their applications or uses.
It is well known to use industrial robots for a variety of manufacturing, assembly and material movement operations. One such type of robotic operation is known as “pick, move and place”, where a robot picks up a part or workpiece from a first location, moves the part and places it at a second location. The first location may be a conveyor belt where randomly oriented parts are streaming, such as parts which were just taken from a mold. The second location may be a shipping container in which the part needs to be placed in a particular location and pose. Another example involves the robot picking up a component and installing it in a product assembly operation.
In order to perform pick, move and place operations of the type described above, a camera is typically used to determine the position and orientation of parts to be picked up, and a robot must be taught to grasp the part in a specific manner using a prescribed gripper such as a finger-type gripper, a parallel-jaw gripper or a magnetic or suction cup gripper. Teaching the robot how to grasp the part according to the part's orientation has traditionally been done by a human operator using a teach pendant. The teach pendant is used by the operator to instruct the robot to make incremental moves—such as “jog in the X-direction” or “rotate gripper about local Z-axis”—until the robot and its gripper are in the correct position and orientation to grasp the workpiece. Then the robot configuration and the workpiece position and pose are recorded by the robot controller to be used for the “pick” operation. Similar teach pendant commands are then used to define the “move” and “place” operations. However, the use of a teach pendant for programming a robot is often found to be unintuitive, error-prone and time-consuming, especially to non-expert operators.
Another known technique of teaching a robot to perform a pick, move and place operation is the use of a motion capture system. A motion capture system consists of multiple cameras arrayed around a work cell to record positions and orientations of a human operator and a workpiece as the operator manipulates the workpiece. The operator and/or the workpiece may have uniquely recognizable marker dots affixed in order to more precisely detect key locations on the operator and the workpiece in the camera images as the operation is performed. However, motion capture systems of this type are costly, and are difficult and time-consuming to set up and configure precisely so that the recorded positions are accurate.
The present disclosure overcomes the limitations of existing robot teaching methods by providing a technique which uses force and vision sensors to record information about a human demonstrator's hand and the workpiece, including both motions and state transition logic, and uses the recorded information to generate a robot program with the pick, move and place motions and corresponding gripping and ungripping commands. The technique is of course applicable to any type of robotic part handling operation—not just pick, move and place.
Along with several techniques for identifying state transitions, and other techniques for refining motions and grasp poses to mitigate extraneous human hand motion, the methods of the present disclosure include identifying and tracking key points of a human hand by analysis of camera images or vision sensor data. Methods and systems for identifying and tracking key points of a human hand by analysis of camera images were disclosed in U.S. patent application Ser. No. 16/843,185 (hereinafter “the '185 application”), titled ROBOT TEACHING BY HUMAN DEMONSTRATION, filed 8 Apr. 2020 and commonly assigned with the present application, and hereby incorporated by reference in its entirety.
The '185 application discloses techniques for analyzing images from a single two-dimensional (2D) or three-dimensional (3D) camera to identify and track 3D coordinates of key points of the human hand. The key points include anatomic features such as the tip, knuckle and base of the thumb, and the tips, knuckles and bases of the fingers. From the 3D coordinates of these key points, the '185 application discloses techniques for computing gripper position and orientation, for both finger-style grippers and suction-cup style grippers among others. For example, the thumb and index finger of the human hand can be transposed to positions of a parallel-jaw or finger-style gripper, or a bisector of the thumb and index finger can be used as an axis of a single-suction-cup gripper. These hand pose and corresponding gripper pose determination techniques based on camera image input, from the '185 application, are used extensively in the methods of the present disclosure.
A human demonstrator 130 uses one or both hands to operate on one or more workpieces. In one example, a workpiece 140 rests on a platform 150. The human demonstrator 130 may pick up the workpiece 140, move it and place it in a new position and pose. A force sensor 142 is located beneath the workpiece 140, either directly beneath the workpiece 140, or under the platform 150. The force sensor 142 is positioned and configured to detect forces on the workpiece 140—particularly vertical forces. The usage of the data from the force sensor 142 is discussed later.
In another example, a second workpiece 160 is coupled to a tool 170 with a force sensor 162 coupled between the second workpiece 160 and the tool 170. The human demonstrator manipulates the tool 170 such that the second workpiece 160 is placed in a desired position and pose in the workpiece 140—such as an electronic component plugged into an assembly, or a peg inserted into a hole. The force sensor 162 is configured to detect forces and torques on the second workpiece 160 during the assembly operation. The usage of the data from the force sensor 162 is discussed later.
The force sensors 142 and/or 162 (usually only one or the other) provide their force and torque data to the computer 120, either wirelessly or via hard-wire connection. Along with the data from the force sensors 142 and/or 162, the computer 120 receives the audio signals from the microphone 112 and the images from the vision sensor 110, where the images are analyzed to determine the pose (3D positions of the key points) of at least one hand of the human demonstrator 130, and may also be analyzed to determine the pose of the other hand and/or the workpieces 140 and 160. In different embodiments of the disclosed techniques, the hand pose data, workpiece pose data, force data and audio data are used by the computer 120 to generate a robot program for an operation such as pick, move and place. All of this is discussed further below.
The robot motion program generated by the computer 120 is provided to a robot controller 170 which controls a robot 180. The controller 170 typically communicates with the robot 180 via a cable 172, including the controller 170 providing joint motion commands to the robot 180 and receiving joint encoder position data from the robot 180, in a manner known in the art. The robot 180 is fitted with a gripper 182, which may be any type of gripper—such as parallel-jaw, finger-style, magnetic, suction cup (single cup or multiple cups in an array), etc. The controller 170 also provides gripper action commands to the robot 180, such as grip and ungrip. The computer 120 may communicate with the controller 170 wirelessly or via a hard-wire or network connection 122.
The system of
The steps illustrated in the box 210 are of course performed by the human demonstrator in a fairly smooth, continuous sequence. It is therefore often difficult for a system capturing the human demonstration, for the purposes of robot program teaching, to detect and determine the exact moment at which a specific step begins or ends. The techniques of the present disclosure address this difficulty by using both force and vision sensors to detect the human demonstration, and including both motion teaching and state transition logic teaching in embodiments of the methodology.
In box 240, the motion teaching steps recorded by the program generation system (e.g., the system of
Thus, the motion teaching steps in the box 240 can be defined simply as pick (or grasp), move and place. The pick step at box 242 records the position and orientation of the hand 214 (the key points of the hand 214) and the object 216 (although object position/pose may already be known) at the moment when the human demonstrator grasps the object 216. The move step at box 244 records the position and orientation of the hand 214 (again, the key points of the hand 214) as the object 216 is moved from its initial position/pose to its destination position/pose. The place step at box 246 records the position and orientation of the hand 214 and optionally the object 216 at the moment when the human demonstrator releases the object 216 at the destination position/pose.
The motion teaching steps in the box 240 are recorded by the program generation system by visually tracking hand motion and optionally object motion. Detection of the key points of the hand 214 from camera images, and conversion of the hand key point data to gripper pose, were described in detail in the '185 application discussed above.
In box 270, the logic teaching steps recorded by the program generation system (e.g., the system of
Shown in a first oval at the left is a pick state 272. Techniques for detecting that the pick state 272 has been entered will be discussed below in relation to later figures. The pick state 272 has an associated action 274 of closing or activating the robot gripper. A move state 276 is the state in which the robot controller moves the gripper and the object from the initial position to the target position. Again, techniques for detecting a transition indicating that the move state 276 has been entered will be discussed below. Following the move state 276 is a place state 278. The place state 278 has an associated action 280 of opening or deactivating the robot gripper.
The present disclosure introduces the concept of detecting a specific event which triggers a transition from one state to the next in robot program generation from human demonstration. In order to apply this concept, the states and the corresponding transitions must first be defined, which is done in
The state diagram 300 includes a pick state 310. The pick state 310 has an entry action of “close gripper” (or “activate gripper” which is more applicable to suction grippers). That is, when the pick state 310 is entered, the program generation system will generate a command to close or activate the robot gripper. The pick state 310 has a transition 312 to a move state 320. The move state 320 is the state in which the object (or workpiece) is moved from its initial pose (position and orientation) to its target pose. In the move state 320, the program generation system will generate commands causing the robot gripper to move the object through the prescribed motion (as captured by the vision sensor 110).
The move state 320 has a transition 322 to a place state 330, triggered by detection of an event as discussed below. The place state 330 is where the robot gripper releases the object in its target pose. The place state 330 thus has an entry action of “open gripper” (or “deactivate gripper”). That is, when the place state 330 is entered, the program generation system will generate a command to open or deactivate the robot gripper. The place state 330 has a transition 332 to a depart/approach state 340. The depart/approach state 340 is the state in which the robot gripper is free to move with no object, after placing the previous object at its target pose and before going to pick up the next object. The depart/approach state 340 has a transition 342 back to the pick state 310, which is entered when a trigger event (discussed below) is detected.
In the state diagram 300, the depart/approach state 340 could be separated into two states—a depart state and an approach state, with a transition from depart to approach being triggered by the arrival of camera images of the next object to pick up. Yet another state, such as “perch” or “wait”, could even be added between the depart state and the approach state. However, the state diagram 300 is drawn with the combined depart/approach state 340 for the sake of simplicity, as this is sufficient for the following discussion of various types of state transition triggers.
Likewise, an object center point 418 may be designated in any suitable fashion—such as a center of the top-view area of the object 414, or or a known object center point from CAD data transposed to fit the camera images. From the 3D locations of the hand center point 416 and the object center point 418, a distance between the two can be computed. This distance is used in the state transition logic.
A box 420 contains illustrations and descriptions of the state transition logic used in this first method. Box 422 includes an illustration of detection of a pick step according to the method. In this method, the transition to the pick state (from approach) is triggered by the distance from the hand center point 416 to the object center point 418 dropping below a prescribed threshold. The threshold is defined based on what point on the hand 412 is chosen as the hand center point 416, and the nature of the object 414 and the object center point 418. For example, the threshold could be set to a value of 75 millimeters (mm), as it may be physically impossible for the hand center point 416 to get much nearer the object center point 418 than 75 mm.
In the method of
Box 424 includes an illustration of detection of a move step according to the method of
Mathematically, the transition to the move state is triggered when |hc,i−hc,i+1|>0, or |Oc,i−Oc,i+1|>0, where hc,i and hc,i+1 are the 3D coordinates of the hand center point 416 at a time step (camera image) i and a following step i+1, respectively, and likewise for the object center point 418. The differences in absolute value brackets are computed as described for the box 422 above. Referring again to
Box 426 includes an illustration of detection of a place step according to the method of
In the method of
Following the triggering of the place state, motion of the hand 412 away from the object 414 can be used to trigger a transition to the depart/approach state, where the motion of the hand 412 is determined in the same manner as the move state discussed above. From the depart/approach state, transition to the pick state is detected as described above relative to the box 422.
The logic teaching method shown in
Box 520 contains an illustration of the second hand 516, specifically the key points and bone segments as detected from the camera image in the manner disclosed in the '185 application. In the box 520, the second hand 516 is not in the “OK” configuration, but rather in a “Ready” configuration, meaning that the human demonstrator is not triggering a state transition. In a box 530, the second hand 516 is shown in the “OK” configuration, meaning that the human demonstrator is triggering a state transition. Because the state diagram 300 is defined to proceed in a continuous loop, with only one valid transition available from each state, the same “OK” sign can be used by the second hand 516 of the human demonstrator to trigger each of the transitions—from approach to pick, from move to place, etc. Other hand gestures besides the “OK” sign could also be used to trigger a state transition.
The logic teaching method shown in
As in the other state transition detection methods described earlier, the verbal command to transition to the pick state causes the system to detect and capture the position and pose of at least the hand 712 (3D coordinates of all detectable key points) at the moment of state transition. This enables the system to compute a gripper position and pose associated with the pick state, with an accompanying command to close or activate the gripper. The system may also optionally detect and capture the pose of the object 714.
At box 720, the hand 712 is moving the object 714 from the pick location to the place location. Just prior to moving the object 714, the human demonstrator would say “move”, causing the system to transition to the move state and capture the motion of at least the hand 712 (and optionally also the object 714). At box 730, the hand 712 is just releasing the object 714 at the place location. Just prior to releasing the object 714, the human demonstrator would say “place”, causing the system to transition to the place state and capture the final pose of the hand 712 (and optionally also the object 714) at the destination location. As discussed earlier, the position and pose/configuration of the hand 712 (3D coordinates of the key points) is readily converted to gripper position and pose by the system, to be included in the robot motion program.
Additional verbal commands could also be recognized by the system, such as “release” or “depart”, and “approach”. However, as discussed previously, these state transitions can be inferred by the system following the place state, according to the state diagram 300.
Graph 860 conceptually illustrates how the force sensor signals may be analyzed to detect state transitions. The graph 860 plots a vertical force from the force sensor 850 on a vertical axis 862 versus time on a horizontal axis 864. Curve 870 depicts the vertical force versus time. At a time indicated at 880, the pick state transition is detected by a slight rise in the vertical force and a noticeable peak, followed by a decrease in the force to a value lower than before the time 880. These force characteristics are consistent with the human demonstrator 810 grasping the small component part and picking it up, and thus can be used as a state transition trigger to the pick state.
In an extended time indicated at 882, the force value continues along at a lower value than before the pick, as the human demonstrator 810 moves the small component part and begins to place it in its designated target position and orientation. The time 882 corresponds with the move state in the state diagram 300. As in the other state transition logic methods, the transition to the move state follows immediately after the transition to the pick state—i.e., just an instant after the time 880.
At a time indicated at 884, the place state transition is detected by a significant rise in the vertical force and a very noticeable peak, followed by a decrease in the force to a value slightly higher than before the time 884. These force characteristics are consistent with the human demonstrator 810 placing the small component part in its target location and orientation, possibly including applying a downward force to cause the component part to press or snap into place in the assembly workpiece 820. As in the other state transition logic methods, a transition to the release/depart state can be inferred to follow immediately after the transition to the place state—i.e., just an instant after the time 884.
In an alternate embodiment, a force sensor is provided in a tool used by the human demonstrator 810, in the manner shown in
In the case of both the tool-mounted force sensor and the force sensor 850 situated below the workpiece 820, the force sensor signals could include any or all three axial forces and any or all three torques, as different components of force and/or torque may contain the most significant characteristics of the state transitions.
Five different methods for state transition logic teaching have been described above. These methods—and combinations thereof—enable precise capturing of the moment of state transition, which in turn enables consistent and accurate capturing of motions from human demonstration, specifically the 3D motion of the hand of the demonstrator. Techniques for improving the quality of the motion teaching are discussed below.
Graph 910 is a 3D graph including a curve 920 plotting the motion of a human hand as it moves an object, at a sequence of points, as the hand is used to demonstrate a pick, move and place operation on the object. The points on the curve 920 have x/y/z coordinates plotted on the three orthogonal axes. The motion of the hand depicted by the curve 920 in this case involves moving the object from a start point 922 (the pick location), up and over an obstacle and to an end point 924 (the place location). The curve 920 includes an area 926 where the human demonstrator, after picking up the object, apparently hesitated, then lowered the object slightly before proceeding with the rest of the movement. The singularity point and the reversal of direction observed in the area 926 are definitely not desirable to include in the robot motion program.
Graph 930 is a 3D graph including a curve 940 plotting a refined motion program for the pick, move and place operation demonstrated by the human in the curve 920. The curve 940 is computed using the original points from the curve 920 as a basis, with a least squares interpolation used to create a new set of points which removes unnecessary or extraneous excursions from the original points, and a spline interpolation used to compute the curve 940 through the new set of points. In the embodiment shown in the graph 930, the curve 940 is used as the motion for the “move” state in the robot program. The least squares fitting and spline interpolation technique of the graph 930 may be applicable in situations where the object “move” motion must pass between multiple obstacles—such as over one and under another.
Graph 950 is a 3D graph including multiple line segments comprising an output robot motion, where the multiple line segments are constructed using the start point 922, the end point 924, and a highest point (maximum z coordinate) 928 from the original points on the curve 920. A first line segment is created by projecting the start point 922 directly upward (same x and y coordinates) to a point 960 which has the same z coordinate as the highest point 928. A second line segment is created from the point 960 to a point 962 which is directly above the end point 924. The second line segment is horizontal, passing through the highest point 928 on its way from the point 960 to the point 962. A third and final line segment is vertically downward from the point 962 to the end point 924. In the embodiment shown in the graph 950, the three line segments (from the start point 922 to the point 960 to the point 962 to the end point 924) are used as the motion for the “move” state in the robot program. The highest point line segment fitting technique of the graph 950 may be applicable in situations where the object “move” motion must simply pass over one or more obstacles.
Box 1020 contains illustrations of how a human hand pose can be transposed to a suboptimal suction gripper pose, in the manner discussed above. In the lower portion of the box 1020 is an isometric view illustration of a top surface 1014a of the workpiece 1014 as defined by a point cloud provided by a 3D camera used as the vision sensor 110. Although the points in the point cloud on the top surface 1014a are not all perfectly coplanar, a surface normal can be computed which is quite accurate. The surface normal to the top surface 1010a would appear vertical in
Box 1050 contains illustrations of how the human hand pose can be adjusted based on workpiece surface normal to provide an optimal suction gripper pose, according to the present disclosure. In the lower portion of the box 1050 is the same isometric view illustration of the top surface 1014a as in the box 1020. In the box 1050, a gripper axis vector 1060 is aligned with the surface normal vector, rather than being computed from the hand pose. In the upper portion of the box 1050, the suction gripper 1040 is shown in an orientation according to the vector 1060. It can be clearly seen that the gripper 1040 is properly oriented normal to the workpiece 1014 when the axis of the gripper 1040 is aligned with the refined vector 1060. Refining the gripper axis vector based on the object surface normal can improve grasp quality, especially in the case of suction cup grippers.
At box 1120, an operation is performed on a workpiece by the human demonstrator, including using a hand to move the workpiece from an initial position and orientation to a final position and orientation. This was also discussed previously—particularly relative to the pick, move and place operation shown in
At box 1140, data from the vision sensor/camera, and other available sensors, is analyzed to create logic data. As discussed earlier, the logic data is used to supplement the hand motion data by recognizing certain events. The logic teaching, performed concurrently with the motion teaching, adds precision and removes ambiguity in the creation of the gripper motion program. At box 1150, the hand motion data from the box 1130, and the logic data from the box 1140 are used to generate a motion program to cause a robot gripper to perform the operation on the workpiece, where the motion program includes a sequence of steps each having a gripper position and orientation. Each step of the motion program may optionally include a grip/ungrip command. The grip/ungrip commands are only needed at the pick and place locations in a pick, move and place operation.
At box 1142, state transitions are detected in the operation being demonstrated, based on a defined state machine model as in
The state transition detection at the box 1142 provides a specific form of the logic data teaching at the box 1140. When state transition detection is performed, the state transition logic is used to precisely define certain events in the gripper motion program—such as the steps (pick and place) at which gripper velocity drops to zero, and the steps (again pick and place) at which a grip or ungrip command is issued. The definition of the state machine model, in advance, allows each state transition to be anticipated such that a particular, recognizable event triggers the appropriate transition.
At box 1132, the hand motion data created at the box 1130 may optionally include motion refinements as discussed in connection with
Transforming hand pose data directly into the gripper motion program offers two significant benefits over prior methods. First, by eliminating workpiece detection from the images for motion teaching, the presently disclosed method reduces computational burden and program complexity. Second, eliminating workpiece detection from the images also solves the problem of workpiece occlusion by the human's hand. Existing systems which require workpiece pose data for robot motion teaching typically need image data from more than one camera in order to ensure workpiece visibility. However, fusing coordinate data from multiple cameras requires complex calibration steps. As long as the object/workpiece pose (position and orientation) are known during the approach state, a sequence of hand pose steps (the hand motion data) is sufficient to define a robot gripper motion program.
After the gripper motion program is created at the box 1150, the motion program is transferred from the computer used for teaching demonstration (the computer 120 in
Throughout the preceding discussion, various computers and controllers are described and implied. It is to be understood that the software applications and modules of these computers and controllers are executed on one or more computing devices having a processor and a memory module. In particular, this includes the processors in the computer 120 and the robot controller 170 shown in
As outlined above, the disclosed techniques for robot program generation by human demonstration make robot motion programming faster, easier and more intuitive than previous techniques, while providing robustness against variations and vagaries in human hand motion through discrete state transition detection and motion path improvements.
While a number of exemplary aspects and embodiments of robot program generation by human demonstration have been discussed above, those of skill in the art will recognize modifications, permutations, additions and sub-combinations thereof. It is therefore intended that the following appended claims and claims hereafter introduced are interpreted to include all such modifications, permutations, additions and sub-combinations as are within their true spirit and scope.