This invention relates to a control apparatus and a control method for a robot performing press-fitting operation and other operation.
There have been known devices that are mounted on the hands of robots and reduce the reaction force during a press-fitting operation (for example, see Patent Literature 1). Patent Literature 1 discloses a press-fitting device that press-fits an axial component into a press-fitting hole formed in a workpiece into which the axial component is to be press-fitted. This press-fitting device includes press-fitting means that is swingably supported by a mounting member with a pair of springs therebetween. Thus, when the axial component receives an eccentric load from the edge of the press-fitting hole, the press-fitting means swings and reduces the press-fitting reaction force.
Patent Literature 1: Japanese Unexamined Patent Publication No. 2006-116669
However, the device described in Patent Literature 1 only reduces the press-fitting reaction force. For example, if there is a misalignment or the like between the axial component and the press-fitting hole due to the individual differences between axial components, it is difficult to press-fit the axial component even if the device described in Patent Literature 1 is used.
An aspect of the present invention is a robot control apparatus configured to control a robot so as to mount a first component supported by a hand of the robot driven by an actuator to a second component, including: a memory unit configured to store a correspondence-relation between a plurality of half-mounted-states of the first component and an optimal action of the robot giving the highest reward for each of the plurality of half-mounted-states obtained beforehand by reinforcement learning; a state detecting unit configured to detect a half-mounted-state of the first component; and an actuator controller configured to identify an optimal action of the robot corresponding to the half-mounted-state detected by the state detecting unit based on the correspondence-relation stored in the memory unit and to control the actuator in accordance with the optimal action.
Another aspect of the present invention is a robot control method controlling a robot so as to mount a first component supported by a hand of the robot driven by an actuator to a second component. The robot control method including: a reinforcement learning step acquiring a correspondence-relation between a plurality of half-mounted-states of the first component and an optimal action of the robot giving the highest reward for each of the plurality of half-mounted-states by mounting the first component to the second component multiple times by driving the hand; and a mounting step, when mounting the first component to the second component, detecting a half-mounted-state of the first component, identifying an optimal action corresponding to the half-mounted-state detected based on the correspondence-relation acquired in the reinforcement learning step, and controlling the actuator in accordance with the optimal action identified.
According to the present invention, reinforcement learning is used. Thus, even if there is a misalignment or the like between the first component and the second component, the first component can be easily mounted on the second component by actuating the hand of the robot.
An embodiment of the present invention will be described with reference to
The robot 1 is, for example, a vertical articulated robot having multiple rotatable arms 11, and the front arm end is provided with a working hand 12. The robot 1 has multiple (for convenience, only one is shown) servo motors 13 for actuating the robot. Each servo motor 13 is provided with an encoder 14 that detects the rotation angle of the servo motor 13. The detected rotation angle is fed back to the controller 2, which then feedback-controls the position and posture of the hand 12 in a three-dimensional space.
The controller 2 includes an arithmetic processing unit including a CPU, ROM, RAM, and other peripheral circuits. The controller 2 outputs a control signal to the servo motor 13 in accordance with a program stored in the memory beforehand, to control the operation of the robot 1. While the robot 1 performs various types of operations, the robot 1 according to the present embodiment is configured to perform, among others, mounting of a workpiece on a component.
Prior to mounting the workpiece 100, a reference workpiece shape is defined. For example, if the workpiece 100 is a tube as in the present embodiment, a cylindrical reference workpiece shape (dotted line) around the axis CL1 is defined. Also, a reference point P0 is set at the front end of the hand 12. The workpiece is mounted by controlling the position of the reference point P0. For example, as shown in
The tubular workpiece 100 has an inherent bending tendency and therefore there are individual differences in shape between workpieces. Such individual differences also occur due to the differences between the molding conditions or the like of workpieces 100. Further, the physical properties (elastic modulus, etc.) of the workpiece 100 may change due to a change in temperature or humidity during operation. Consequently, as shown in
An example approach to avoid the bend, buckling, or the like of the workpiece 100 is to dispose, on the hand 12, a reaction force receiver that reduces the press-fitting reaction force. However, the disposition of such a receiver complicates the configuration of the hand 12 and upsizes the hand 12. Also, even if the force acting on the hand 12 is controlled by disposing, on the hand 12, the reaction force receiver or a sensor or the like that detects such a force (force control), it is difficult to quickly press-fit the flexible workpiece 100, such as a tube. In particular, if there is a misalignment between the workpiece 100 and the component 101, it is difficult to press-fit the workpiece 100 while resolving the misalignment. For these reasons, in the present embodiment, the robot control apparatus is configured as follows such that the workpiece 100 is quickly press-fitted without complicating the configuration of the hand 12.
As shown in
As shown in
The input unit 16 in
The controller 2 includes a memory unit 21 and a motor control unit 22 as functional elements. The motor control unit 22 includes a learning control unit 23 that controls the servo motor 13 during reinforcement learning and a normal control unit 24 that controls the servo motor 13 during a normal workpiece mounting operation. The memory unit 21 stores a correspondence-relation between half-mounted-states of the workpiece 100 and actions of the robot 1 (a Q table (to be discussed later)). In the reinforcement learning step, the learning control unit 23 drives the servo motor 13 to mount the workpiece 100 on the component 101 multiple times. Reinforcement learning will be described below.
Reinforcement learning is a type of machine leaning that addresses an issue in which an agent in an environment observes the current state and determines an action to be taken. The agent obtains a reward from the environment by selecting an action. While there are various reinforcement learning techniques, Q-learning is used in the present embodiment. Q-leaning is a technique that performs leaning such that an action having the highest action evaluation function value (Q-value) (an action that receives the greatest amount of reward) is taken in a certain environment.
The Q-value is updated by the following formula (I) on the basis of a state st and an action at at time t.
Q(st, at)←Q(st, at)+α[rt+1+γmaxQ(st+1, at+1)−Q(st, at)] (I)
In the formula (I), α is a coefficient (leaning rate) representing the degree to which the Q-value is updated, and γ is a coefficient (discount rate) representing the degree to which the result of an event which may occur from now on is reflected. The coefficients α, γ are properly adjusted and set within 0<α≤1 and 0<γ≤1, respectively, on the basis of experience. Also, r is an index (reward) for evaluating the action at with respect to a change in the state st and is set such that the Q-value is increased when the state st becomes better.
What should be done first to perform an operation as reinforcement learning is to define the reference movement path through which the workpiece 100 moves in the period from the start to the end of its mounting.
Specifically, to press-fit the flexible workpiece 100 into the outside of the component 101, the operator first grasps the front end of the workpiece 100 and inserts the front end into the peripheral surface of the component 101 obliquely at a predetermined angle θ (e.g., 45°) with respect to the axis CL3. The operator then rotates the workpiece 100 so that the central axis CL2 of the workpiece 100 is aligned with the axis CL3, and then presses the workpiece 100 along the axis CL3 until the workpiece reaches a predetermined position while keeping the posture of the workpiece. Considering this aspect, the reference movement path PA used when the robot 1 press-fits the workpiece 100 is defined on the YZ-plane, as shown in
In
To cause the robot 1 to perform a workpiece mounting operation as reinforcement learning (Q-leaning), it is necessary to define the states of the workpiece 100 in the period from the start to the end of mounting of the workpiece 100 (the half-mounting states of the workpiece 100) and actions that the robot 1 can take. First, the half-mounted-states of the workpiece 100 will be described.
The amount of change ΔFz of the force is the difference between the force Fz acting on the workpiece in the current step STt and the force Fz that has acted on the workpiece in the immediately preceding step STt−1. For example, when the current step is ST3, the difference between the force Fz acting in step ST3 and the force Fz that has acted in the immediately preceding step ST2 is ΔFz. By using the amount of change ΔFz of the force as a parameter, the state can be identified accurately without being affected by the individual differences between workpieces 100. If the force Fz itself is used as a parameter, the threshold needs to be reset each time the type of workpiece changes. On the other hand, in the present embodiment, the amount of change ΔFz of the force is used as a parameter. Thus, even if the type of workpiece changes, the threshold does not need to be reset, and the state is easily identified. The moment Mx becomes a positive value when a rotation force in the positive Y-direction acts on the hand 12, and it becomes a negative value when a rotation force in the negative Y-direction acts on the hand 12. By determining whether the value of the moment Mx is positive or negative, the direction of misalignment of the workpiece 100 with respect to the axis CL3 can be identified.
In
Mode MD5 is a state in which the amount of change ΔFz of the force is greater than ΔF1 and the moment Mx is equal to or greater than M2 and equal to or smaller than M1. As shown in
The learning control unit 23 identifies the current half-mounted-state of the workpiece 100, that is, in which of the modes MD1 to MD6 the workpiece 100 is, on the basis of the force Fz and moment Mx detected by the force detector 15, more accurately, the amount of change ΔFz of the force and the moment Mx.
The reward r in the formula (I) is set using a reward table stored in the memory beforehand, that is, a reward table defined by the correspondence-relation between the state in the current step STt and the state in the immediately preceding step STt−1.
If there is no change between the state in the current step STt and the state in the immediately preceding step STt−1 (e.g., both the state in the current step STt and the state in the immediately preceding step STt−1 are the buckling state MD1 or MD3), a predetermined value (e.g., −3) is set as the reward r (specifically, the reward r11, r22, r33, r44, r66). In this case, it is determined that the state would not be improved any more, and therefore a negative reward r is given. Otherwise (if the state is changed to a state other than the normal state MD5), 0 is set as the reward r. Note that the value of the reward r may be properly changed on the basis of the result of the actual press-fitting operation. The learning control unit 23 sets the reward r of the formula (I) in each step in accordance with the reward table in
Next, the action of the robot 1 during mounting of the workpiece will be described. First, as shown in
For example, if the position of the front end of the hand (reference point P0) is point P1 on the reference movement path PA in
The directions in which the hand 12 can move (the angles indicating the movement directions) and the amount of movement of the hand 12 are stored in the memory beforehand. For example, 0° and ±45° with respect to the axis CL1 are set as the angles indicating the movement directions, and the length corresponding to the distance between the adjacent dots is set as the amount of movement. The learning control unit 23 operates the robot 1 such that a higher reward is obtained in accordance with those set conditions. The robot 1 is able not only to move the hand 12 but also to rotate it around the X-axis. Accordingly, the amount of rotation around the X-axis with respect to the movement direction of the hand 12 is also set in the controller 2.
An operation as reinforcement learning can be performed by applying the nine possible actions a1 to a9 to each of the six possible half-mounted-states of the workpiece 100 (modes MD1 to MD6). However, in this case, a great number of state-action combinations are made, and it takes much time to perform the reinforcement learning step. For this reason, to reduce the time required to perform the reinforcement learning step, it is preferred to narrow down the actions in reinforcement learning.
The narrowing-down of actions is performed, for example, by causing an operator skilled in mounting a workpiece to mount a workpiece manually and grasping the pattern of the actions taken by him or her beforehand. Specifically, if there are actions that the operator has not selected in steps ST1 to ST20 in the period from the start to the end of mounting of the workpiece 100, such actions are removed. Thus, the actions are narrowed down.
For example, in steps ST1 to ST9 and steps ST13 to ST20 in
The actions applicable in steps ST1 to ST20 are set through the input unit 16 beforehand. The learning control unit 23 selects any action that allows for obtaining a positive reward, from these applicable actions and causes the robot 1 to take the selected action, as well as calculates the Q-value using the formula (I) each time it selects an action. The workpiece mounting operation as reinforcement learning is repeatedly performed until the Q-value converges in each of steps ST1 to ST20.
First, in S11, the normal control unit 24 detects the current half-mounted-state of the workpiece 100, on the basis of a signal from the force detector 15. That is, it detects to which of modes MD1 to MD6 the workpiece 100 corresponds. Then, in S12, the normal control unit 24 reads a Q-table QT corresponding to the current step STt from the memory unit 21 and selects an action having the highest Q-value with respect to the detected half-mounted-state of the workpiece 100. Then, in S13, the normal control unit 24 outputs a control signal to the servo motor 13 so that the robot 1 takes the selected action.
A specific operation of the robot control apparatus according to the embodiment of the present invention will be described along with a robot control method.
First, before performing the reinforcement learning step, a skilled operator mounts the workpiece 100 to the component 101 manually as a prior step. At this time, the action pattern is analyzed while changing the state of the workpiece 100 to modes MD1 to MD6. Thus, the reference movement path PA (
When the prior step is complete, the reinforcement learning step is performed. In the reinforcement learning step, the learning control unit 23 outputs a control signal to the servo motor 13 to cause the robot 1 to actually repeatedly mount the workpiece 100. At this time, the learning control unit 23 selects one of the multiple actions set in each of steps ST1 to ST20 beforehand and controls the servo motor 13 so that the robot 1 takes that action. The learning control unit 23 also grasps a change in the state in accordance with a signal from the force detector 15 and determines a reward r based on the change in the state with reference to the predetermined reward table (
Then, using the reward r, the learning control unit 23 calculates a Q-value corresponding to the state and action in accordance with the formula (I) in each of steps ST1 to ST20.
In the initial state, in which the reinforcement learning has been started, the Q-value is 0, and the learning control unit 23 randomly selects an action in each of steps ST1 to ST20. As the reinforcement learning proceeds, the learning control unit 23 preferentially selects actions by which a higher reward r is obtained, and the Q-values of specific actions are gradually increased with respect to the states in steps ST1 to ST20. For example, if a bend or buckling (modes MD1, MD3, MD4, MD6) of the workpiece 100 due to a misalignment is corrected, a high reward r is obtained. Accordingly, the Q-value of an action that corrects the bend or buckling is increased. The Q-value gradually converges to a constant value (
When the reinforcement learning step is complete, the normal control unit 24 mounts the workpiece 100 as a mounting step. Specifically, the normal control unit 24 detects the half-mounted-state of the workpiece 100 in the current step STt in accordance with a signal from the force detector 15 (S11). The normal control unit 24 can identify the current step among ST1 to ST20, for example, in accordance with a signal from the encoder 14. The normal control unit 24 also selects, as the optimal action, an action having the highest Q-value from among multiple actions corresponding to the half-mounted-states of the workpiece 100 set in the Q-table (S12) and controls the servo motor 13 so that the robot 1 takes the optimal action (S13).
Thus, for example, if a misalignment occurs between the workpiece 100 and the component 101 due to the individual differences between workpieces 100, the normal control unit 24 is able to detect the misalignment and to cause the robot 1 to operate such that the robot 1 takes a proper action that corrects the misalignment. That is, the robot 1 is able to take the optimal action in accordance with a change in the state and to favorably press-fit the workpiece 100 into the component 101, regardless of the individual differences between workpieces 100. Even if the workpiece 100 is configured as a flexible tube, the normal control unit 24 can cause the robot 1 to press-fit the workpiece 100 while easily and properly correcting a bend or buckling of the workpiece 100.
According to the embodiment of the present invention, the following advantageous effects can be obtained:
(1) The robot control apparatus according to the embodiment of the present invention controls the robot 1 so that the workpiece 100 supported by the hand 12 of the robot 1 driven by the servo motor 13 is mounted on the component 101. The robot control apparatus includes the memory unit 21 that stores the correspondence-relation between the half-mounted-states (MD1 to MD6) of the workpiece obtained by the reinforcement learning beforehand and the optimal actions (a1 to a6) of the robot 1 that give the highest rewards to the half-mounted-states of the workpiece (Q-table), the force detector 15 that detects the half-mounted-state of the workpiece 100, and the normal control unit 24 that identifies the optimal action of the robot 1 corresponding to the half-mounted-state of the workpiece detected by the force detector 15 on the basis of the Q-table stored in the memory unit 21 and controls the servo motor 13 in accordance with this optimal action (
As seen above, the robot control apparatus controls the servo motor 13 with reference to the Q-table obtained by the reinforcement learning. Thus, even if there is a misalignment between the central axis CL2 of the workpiece 100 and the axis CL3 of the component 101 due to the individual differences between workpieces 100, such as a bend tendency, the robot control apparatus is able to cause the robot 1 to easily and quickly press-fit the workpiece 100 into the component 101 while correcting the misalignment, without causing a bend, buckling, or the like in the workpiece 100. Also, there is no need to separately dispose a reaction force receiver or the like on the hand 12. This allows for simplification of the configuration of the hand 12, that is, allows for avoidance of upsizing of the hand 12.
(2) The optimal action of the robot 1 is defined by a combination of the angle indicating the movement direction of the hand 12, the amount of movement of the hand 12 along the movement direction, and the amount of rotation of the hand 12 with respect to the movement direction (
(3) The force detector 15 detects the translational forces Fx, Fy, and Fz and the moments Mx, My, and Mz acting on the hand 12, and identifies the half-mounted-state of the workpiece 100, on the basis of the detected translational force Fy and moment Mx (
(4) The memory unit 21 stores the correspondence-relation between the multiple states of the workpiece 100 in the period from the start to the end of mounting of the workpiece 100 and the optimal actions of the robot 1, that is, the Q-table (FIG. 10A and
(5) The robot control method according to the embodiment of the present invention is a method for controlling the robot 1 so that the workpiece 100 supported by the hand 12 of the robot 1 driven by the servo motor 13 is mounted on the component 101 (
(6) The robot control method according to the embodiment of the present invention further includes the prior step of mounting, by the operator, the workpiece 100 on the component 101 prior to the reinforcement learning step. The actions of the robot 1 in the reinforcement learning step is determined on the basis of the action pattern of the operator grasped in the prior step. Thus, the robot 1 is able to take actions similar to those of the skilled operator. Also, the actions of the robot 1 can be narrowed down such that the actions a1 to a3 are taken in steps ST1 to ST9 and steps ST13 to ST20 and the actions a4 to a6 are taken in steps ST10 to ST12. This allows for a reduction in the time required for the reinforcement learning step, allowing for efficient control of the robot 1.
The above embodiment can be modified into various forms, and modifications will be described below. While, in the above embodiment, the controller 2 configured as a robot controlling apparatus includes the learning control unit 23 and normal control unit 24 and the learning control unit 23 performs a workpiece mounting operation as reinforcement learning, a different controller may perform such a workpiece mounting operation in place of the learning control unit 23. That is, the Q-table indicating the correspondence-relation between the half-mounted-states of the workpiece 100 and the optimal actions of the robot 1 may be obtained from the different controller and stored in the memory unit 21 of the robot control apparatus serving as a memory unit. For example, the same Q-table may be stored in the memory units 21 of mass-produced robot controllers at the time of shipment from the factory. Accordingly, the learning control unit 23 may be omitted from the controller 2 (
While, in the above embodiment, the correspondence-relation between the half-mounted-states of the workpiece 100 and the optimal actions of the robot 1 are obtained using the Q-leaning, any technique other than Q-leaning may be used as reinforcement learning. Accordingly, the above correspondence-relation may be stored in the memory in a form other than the Q-table. While, in the above embodiment, the force detector 15 detects the half-mounted-state of the workpiece 100, a state detector is not limited to the force detector 15. For example, the half-mounted-state of the workpiece 100 may be detected by mounting a pair of vibration sensors on the peripheral surface of the base end of the workpiece 100 or the front end of the hand and detecting the moment on the basis of the difference between the times at which the pair of vibration sensor detect vibration.
While, in the above embodiment, the normal control unit 24 serving as an actuator controller identifies the optimal action of the robot 1 corresponding to the half-mounted-state of the workpiece 100 detected by the force detector 15 on the basis of the Q-table stored in the memory beforehand and controls the servo motor 13 in accordance with that optimal action, the actuator controller may be configured otherwise. The robot 1 may include an actuator (e.g., cylinder) of a type other than the servo motor 13, and the actuator controller may control such an actuator so that the robot 1 takes the optimal action. While, in the above embodiment, the half-mounted-states of the workpiece 100 are classified into the six modes MD1 to MD6, the states may be classified into any other type of modes depending on the material, shape, or the like of the workpiece 100.
While, in the above embodiment, the vertical articulated robot 1 is used as a robot, the robot may be configured otherwise. While, in the above embodiment, the flexible tube is used as the workpiece 100, the shape and material of a workpiece may be of any type. For example, the workpiece 100 may be a metal. While, in the above embodiment, press-fit of the tubular workpiece 100 (first component) into the pipe-shaped component 101 (second component) is assumed as a workpiece mounting operation, the first component and second component need not have such configurations and therefore the mounting operation performed by the robot need not be a press-fitting operation. The robot control apparatus and robot control method of the present invention can be also applied to other types of operations.
The above description is only an example, and the present invention is not limited to the above embodiment and modifications, unless impairing features of the present invention. The above embodiment can be combined as desired with one or more of the above modifications. The modifications can also be combined with one another.
1 robot, 2 controller, 12 hand, 13 servo motor, 15 force detector, 21 memory unit, 24 normal control unit, 100 workpiece, 101 component
Number | Date | Country | Kind |
---|---|---|---|
2016-168350 | Aug 2016 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2017/010887 | 3/17/2017 | WO | 00 |