The present invention relates to a robot control device, and a method and a non-transitory computer-readable storage medium for controlling the same.
In the field of factory automation (FA), attention has been drawn to operation automation in a factory by using a robot arm. An example of a task in which a robot arm is used is a pick-and-place operation. To achieve the pick-and-place operation, a program for controlling the robot arm, referred to as teaching, needs to be created. The teaching is mainly a process of photographing a workpiece with a 2D or 3D camera, estimating a position and a shape by Computer Vision, and controlling the robot arm to be in a specific position and orientation (for example, Patent Document 1: Japanese Patent Laid-Open No. 2017-124450). Among these, estimating the position and the shape particularly requires trial and error, and thus requires man-hours. However, in an actual site of a factory, there are workpieces having various shapes, so the teaching needs to be performed for each workpiece, and a complex task, such as loading in bulk, makes the teaching more difficult. In recent years, there has been a technology in which AI is used for robot arm control due to the coming of the A boom. An example is Non-Patent Document 1, “Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning (Google)”.
However, in Patent Document 1, position and orientation estimation is performed by matching of a 3D model, but a relatively expensive 3D camera needs to be used in order to acquire position and orientation information of a workpiece with high accuracy.
According to an aspect of the invention, there is provided a robot control device for controlling a robot configured to perform a predetermined operation, the robot control device comprising: an acquisition unit configured to acquire a plurality of images captured by a plurality of image capturing devices including a first image capturing device and a second image capturing device different from the first image capturing device; and a specification unit configured to use the plurality of captured images acquired by the acquisition unit as inputs to a neural network, and configured to specify a control instruction for the robot based on an output as a result from the neural network.
According to the present invention, by providing a neural network that can perform robot control from an input of a 2D video image, a predetermined operation can be performed by a robot with an intuitive and simple configuration.
Further features of the present invention will become apparent from the following description of exemplary embodiments (with reference to the attached drawings).
Hereinafter, embodiments will be described in detail by referring to the accompanying drawings. Note that the following embodiments do not limit the invention according to the claims. Although a plurality of features is described in the embodiments, some of the plurality of features may not be essential to the invention, and the plurality of features may be arbitrarily combined. Further, in the accompanying drawings, identical or similar components are denoted by identical reference signs, and redundant description will be omitted.
In addition, Non-Patent Documents 2 to 5 to be cited in the embodiments are as follows.
The two embodiments to be described below are common in terms of a basic configuration and execution of machine learning, but differ from each other in terms of environment to learn. Thus, outlines of these two embodiments will be described first.
In a first embodiment, a robot control system will be described that performs robot control by performing learning of a neural network on a simulator on a computer and applying the learned model to an actual robot. Since the simulator can be operated faster than the actual robot, learning can be converged quickly.
In a second embodiment, a robot control system that performs learning of a neural network on an actual robot will be described. While the use of the simulator in the first embodiment has an advantage that the learning can be speeded up, a contrivance is needed that fills a difference between the simulator and the actual robot when the learned model learned by the simulator is applied to the actual robot. The learning can be performed on the actual robot to eliminate a difference in environment between in learning and in performing inference.
The outlines of the two embodiments have been described above. Each of the embodiments will now be described in detail. Note that redundant descriptions of parts common to the respective embodiments will be omitted.
In the first embodiment, a process of constructing and using a learning model in a picking operation until a robot arm moves from an initial state and grips a workpiece will be described. The operation after the gripping is not particularly limited, but examples thereof include movement to another location, alignment, and inspection. The operation after the gripping may be implemented in a configuration using a neural network to be described hereinafter, or may perform movement and alignment by motion planning.
In addition, when
Also, the actual robot 100 is a robot that operates by an articulated structure and a servo motor, and includes an arm. A gripper 101 for gripping a target object is attached to the robot arm. Note that a specific configuration of the robot arm 100 and the gripper 101 is well known to those skilled in the art, and thus detailed description thereof will be omitted.
Further, the first image capturing device 110 and the second image capturing device 120 are cameras that can acquire a color image configured by respective two-dimensional RGB components, but distance information other than RGB or the like may also be included. A workpiece 130 is a target object that is gripped by the robot arm 100, and it is possible to acquire positional coordinates on the simulator, and to arbitrarily specify an arrangement position.
In S10, the control unit 20 initializes a time T to “0”. Next, in S11, the control unit 20 initializes a state and starts an episode. The episode is a unit of a series of processes from the start to the end of a task in reinforcement learning. In the present embodiment, positions of the robot and the workpiece are in an initial state at the start of the episode, and the episode is ended when an episode end condition is satisfied. The episode end condition is such as when the agent succeeds the task, or when an error occurs. The error is, for example, a case where the robot arm collides with itself or a floor, and the like. The specific initialization of the state is to move the robot arm 100 to a predetermined position, to place the workpiece 130 in a predetermined position, and to set an accumulated total of the obtained reward to “0”. In this case, the robot arm 100 may be returned to a fixed position, but when the workpiece 130 is randomly disposed within a range where the arm reaches, the neural network can perform the learning so as to be able to take into account the position of the workpiece in the input image and select the action. In S12, the control unit 20 initializes the number of steps t to “0”.
In S13, the control unit 20 causes the first image capturing device 110 and the second image capturing device 120 to capture images, and receives the captured images. In S14, the control unit 20 inputs the captured images to the neural network 340. In the inputting, the control unit 20 resizes each captured image into a reduced image having, for example, a pixel size of 84×84, or the like. In S15, the control unit 20 operates the robot arm 100 in accordance with a control instruction output by the neural network 340. The control instruction of the robot which is the output of the neural network is an output of a softmax function, and is expressed by a probability of which axis is to be moved. The robot is operated according to the probability. Note that the output of the neural network need not be the control instruction itself, or it may be determined which control instruction to be used based on the output of the neural network. This becomes possible, for example, by holding a table in which the output of the neural network and the control instruction are associated with each other, and the like. In this manner, it is possible to employ various forms as long as the control unit 20 can identify the control instruction based on the output of the neural network.
In S16, the control unit 20 determines whether a reward providing condition (see table 400 in
In S19, the control unit 20 determines whether or not the time T becomes equal to or larger than a predetermined threshold Th_a. When the time T is equal to or larger than the threshold Th_a, the control unit 20 stores weights of the neural network as the learned model. Here, as the threshold Th_a in S19, a large value such as the eighth power of 10 is specified. Here, it is because since it is unpredictable when the learning is converged, a large value is specified as the threshold to cause a learning loop to be repeated. However, it is also possible to determine that learning has been converged and end the learning.
On the other hand, in a case where the determination result of S19 indicates that the time T is smaller than the threshold Th_a, the control unit 20 advances the processing to S21. In S21, the control unit 20 determines whether or not the number of steps t is equal to or larger than a threshold Th_b. When the number of steps t is equal to or larger than the threshold Th_b, the control unit 20 advances the processing to S22. In this S22, the control device 20 performs learning of a plurality of steps as a batch. The threshold Th_b of the number of steps t is a unit in which the batch learning is performed, and is specified to “20”, for example. Thereafter, the control unit 20 returns the processing to S12.
In addition, in a case where the determination result of S21 indicates that the number of steps t is smaller than the threshold Th_b, the control device 20 advances the processing to S23. In S23, the control unit 20 determines whether or not the episode end condition is satisfied. When the control unit 20 determines that the episode end condition is not satisfied, the control unit 20 returns the processing to S13. Further, when it is determined that the episode end condition is satisfied, the control unit 20 advances the processing to 524. In S24, the control unit 20 performs learning of the neural network. A batch size of the learning at this time is the number of steps t. In the learning of the neural network, weighting values are adjusted so as to reduce an error of an output of each perceptron by the technique that is referred to as backpropagation. The details of the learning are omitted, because they have been known.
Here, an outline of a configuration of the neural network will be described by using
Reference signs 401 and 402 denote layers that extract an image feature amount and that are referred to as a convolutional layer, and the layers apply a filter with predetermined parameters to an input image data 410. The predetermined parameters in the filter correspond to weights of the neural network. A reference sign 403 denotes a fully connected layer, and the fully connected layer combines data in which a feature portion has been extracted through the convolutional layer to one node. A long-short term memory (LSTM) with a reference sign 404 is a type of recursive neural network that is referred to as a long short-term memory neural network to learn and retain long-term dependencies between time steps of time series data. A reference 405 denotes a fully connected layer, and its output is converted to a probability by using a softmax function to be served as a strategy. The strategy is a probability of taking any action in a state. A reference sign 406 denotes a fully connected layer, the output is a state value function, and is a predictive value of a reward to be obtained with the state being as a start point. While the A3C configuration has been described above, UNREAL is configured with three auxiliary tasks in addition to A3C. A reference sign 420 denotes a replay buffer, which holds the latest several number of steps of images, rewards, and actions. Inputs of the three auxiliary tasks are an image obtained from the replay buffer 420.
One of the ancillary tasks is reward prediction 407, which estimates an immediate reward from past information that a reward has been obtained. Generally, reinforcement learning has the so-called sparse reward problem that the agent can perform learning only from an experience that can obtain a reward, and thus a reward can be obtained only when the task is successful. For example, in the present embodiment as well, a reward cannot be obtained even when the robot arm 100 is operated one step from the initial state. By using the task of the reward prediction in such an environment, an event for which a reward arbitrarily occurs is retrieved from the replay buffer, and is generated. The second one of the ancillary tasks is value function replay, and has the same function as the output of the fully connected layer 406, and an input image is input from the replay buffer. Then, the third one is pixel control 408, and learns an action that operates such that an input image largely changes. The output is an action value function, and estimates a change amount of a pixel after the action is taken.
The input image 410 in
In the manner described above, the features of the input data are learned and a learning model for estimating a control instruction for the robot arm from the input is recursively obtained.
The present first embodiment applies the learned model of the neural network that has been learned on the simulator to the actual robot.
In S100, the control unit 20 loads the learned model that has been stored in S20 in
In the present first embodiment, the model learned by the simulator has been applied as it is to the actual machine, but an appearance of a video image on the simulator and an appearance of a video image in the real world are not completely the same in terms of a way of receiving light, texture of an object, or the like. Thus, even when an image in the real world is input to the neural network 340 in S102, an expected control instruction may not be output. In a method referred to as domain randomization in Non-Patent Document 5, by changing parameters such as a background, texture of a workpiece, a position of a light source, brightness, colors, a position of a camera, and noise, in a wide variety of variations when learning is performed on a simulator, a robust generalized neural network adaptive to a video image in the real world can be constructed. In a case of the present first embodiment, a model of the neural network can be constructed that reduces appearance gaps between video images on the simulator and in the real world, for example, by randomly changing these parameters for each episode and changing an appearance of an environment.
According to the operations described above, it is possible to control the robot control by simply inputting only a video image by a two-dimensional image capturing device to the neural network.
Here, by using a technique that is referred to as Grad-CAM (Non-Patent Document 4) which indicates where the convolutional layer of the neural network has an interest in an image, it is possible to visualize where the neural network has an interest in the image to make a determination. Deep learning typically has a black box inside the neural network and is not easily analyzed. Also, even when a task is successful/unsuccessful, it is difficult to understand why the task is successful/unsuccessful. Therefore, it is very important to visualize a point of interest (or a region of interest) of the neural network. Generally, the convolutional layer retains spatial information, which is lost in the fully connected layer. Then, information of the last layer among the convolutional layers is used in Grad-CAM to create a heat map, since more abstracted information is held as the stage goes further to the rear side in the convolutional layers. As the details are described in Non-Patent Document 4, description thereof will be omitted, but a method of applying Grad-CAM to the neural network, the method used in the present embodiment, will be briefly described.
Next, the second embodiment will be described. Note that, in the second embodiment and the first embodiment, the basic configuration and operation are common, and thus redundant descriptions of these points will be omitted. In the second embodiment, learning of a neural network also utilizes an actual machine.
Thus, domain randomization required for learning on the simulator is not required. Also regarding a reward, on the simulator, a distance between a workpiece and an end effector is easily determined, but while in reality, an absolute position of the end effector is determined based on kinematics, a position of the workpiece cannot be mechanically determined, so the workpiece is manually placed, and an operation such as inputting the position is required. This is similar when fine tuning is performed in an actual machine in the first embodiment.
According to the operations described above, it is possible to control the robot control by simply inputting only a video image by a two-dimensional image capturing device to the neural network.
In the embodiment described above, the so-called picking operation in which the robot arm moves to grip a workpiece has been described, but the present invention is also applicable to other operations. For example, a different working device can be attached to the tip of the robot arm, for example, to be applied to welding, measuring, testing, surgery, and the like.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2019-192132, filed Oct. 21, 2019, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2019-192132 | Oct 2019 | JP | national |