This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2020-073729, filed on Apr. 16, 2020, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a storage medium, a learning method, and an information processing apparatus.
With development of robot functions, use of robots is expected as an alternative to manual work. To operate a robot in the same way as manual work, the robot needs to be operated by a skilled person. Therefore, to operate a robot automatically, the robot is made to learn an operation trajectory of manual work by machine learning including deep learning.
For example, a model is generated by executing machine learning by using a large amount of training data in which an image is associated with a teacher label indicating a desired operation content, and at the time of prediction after the learning is completed, an image is input to the model to predict an operation content. In addition, in object detection by machine learning, a model is generated by executing machine learning by using training data in which each image is associated with a desired output and an object position, and at the time of prediction, an object position is also predicted.
However, it may be difficult to collect a large amount of training data to which teacher labels are added (hereinafter, may be referred to as teaching data) in advance. Thus, in recent years, sequential learning (hereinafter, may be referred to as teaching-less learning) is used in which a machine learning model predicts operation and machine learning is sequentially executed while obtaining feedback as to whether or not a result of the prediction is successful. Taking a picking robot as an example, an object to be gripped is predicted from an image showing a plurality of objects by using a machine learning model, and then picking operation is actually performed by an actual machine according to the prediction, and success or failure of gripping is evaluated according to the actual operation. In this way, training data in which an image and success or failure of gripping are associated (hereinafter, may be referred to as a trial sample) is generated and accumulated, and when a predetermined number or more of trial samples are accumulated, machine learning is executed by using the accumulated trial samples. As related art, Lerrel Pinto, Abhinav Gupta, “Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hour”, Sep. 23, 2015, arXiv: 1509.06825v1, and the like are disclosed.
According to an aspect of the embodiments, A non-transitory computer-readable storage medium storing a program that causes a computer to execute a process, the process includes identifying, among combinations of any two pieces of image data included in a plurality of pieces of image data that satisfies a first condition, similarity between two pieces of image data in a combination in which one image data satisfies a second condition in addition to the first condition; identifying, based on the calculated similarity between the two pieces of image data, a score that becomes greater as the similarity increases; and performing, by using training data based on another image data in the combination and the score, machine learning.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
However, in the above sequential learning, since it is difficult to include a degree of success in training data which is a trial sample, machine learning is not stable, and there is a possibility of fall into a local solution, and accuracy of machine learning may be lowered.
For example, success or failure of gripping when picking operation is performed by an actual machine may be determined only by whether the gripping is successful or unsuccessful. Therefore, even in a case where it is desirable that gripping a fragile portion, such as in a precision instrument, is learned as a failure pattern, when the gripping is successful, it is learned as a success pattern.
Note that, although a parameter design considering all gripping patterns may be considered, it is not realistic because there are innumerable gripping patterns depending on the shape of an object. In addition, although it is possible to use a machine learning model that evaluates a degree of success, the cost of separately preparing training data for learning the machine learning model in advance is high, and it does not match an original purpose of sequential learning.
In view of the above, it is desirable to improve accuracy of machine learning.
Hereinafter, embodiments of a learning program, a learning method, and an information processing apparatus disclosed in the present application will be described in detail with reference to the drawings. Note that the present disclosure is not limited to the embodiments. In addition, the embodiments may be appropriately combined within a range without inconsistency.
[Description of Information Processing Apparatus]
An information processing apparatus 10 according to a first embodiment is an example of a computer device that predicts a gripping position (gripping object) of a picking robot (hereinafter, may be referred to as an “actual machine”), and detects the gripping position from an image showing a plurality of objects as gripping objects. In the information processing apparatus 10, a detection model using machine learning predicts an output, and machine learning of the detection model is sequentially executed while obtaining feedback as to whether or not a result of the prediction is appropriate.
For example, the information processing apparatus 10 inputs an image to the detection model, predicts a gripping position, and actually performs picking operation by the picking robot according to the prediction. Then, the information processing apparatus 10 evaluates success or failure of gripping according to the actual operation, and generates a trial sample in which the image, the success or failure of gripping, and a teacher label (successful gripping position) are associated with each other.
In this way, the information processing apparatus 10 generates teaching data from an evaluation result of the actual operation using the actual machine, and then executes machine learning of the detection model. At this time, in evaluation of success or failure of gripping, the information processing apparatus 10 improves accuracy of machine learning of the detection model not only by determining success or failure but also by calculating a grip score indicating a degree of success.
Subsequently, the information processing apparatus 10 inputs an acquired image showing an object to a detection model before learning, and acquires a prediction image for predicting a gripping object. Thereafter, the information processing apparatus 10 executes, by using an actual machine, gripping of the gripping object specified by the prediction image, and generates an actual machine result which is an image showing an actual gripping result. Then, the information processing apparatus 10 inputs an ideal gripping image showing an ideal gripping result and the actual machine result to the learned evaluation model, and calculates a grip score indicating a degree of success by using an output by the evaluation model. Thereafter, the information processing apparatus 10 generates a trial sample in which the acquired image, the teacher label, and the grip score are associated with each other.
Thereafter, when a prescribed number or more of trial samples are generated, machine learning of the detection model is executed. For example, as illustrated in
In this way, when machine learning of the detection model is completed, the information processing apparatus 10 inputs an object image showing an object to the learned detection model, and executes gripping detection (gripping prediction) for detecting a gripping position of the picking robot by an output of the detection model.
[Functional Configuration]
The communication unit 11 is a processing unit that controls communication with another device, and is implemented by a communication interface, for example. For example, the communication unit 11 transmits an operation instruction to the picking robot which is the actual machine, and acquires an operation result from the picking robot. Note that the operation result may be image data, or a command result or the like capable of generating image data.
The storage unit 12 is a processing unit that stores various types of data, programs executed by the control unit 20, and the like, and is implemented by a memory or a hard disk, for example. For example, the storage unit 12 stores an evaluation training data database (DB) 13, an ideal grip data DB 14, a trial sample DB 15, a learning result DB 16, and a detection result DB 17.
The evaluation training data DB 13 stores a plurality of pieces of training data used for machine learning of the evaluation model. For example, each piece of training data stored in the evaluation training data DB 13 includes a pair of two pieces of image data. Here, generation of training data when the training data is input to the evaluation model will be described. Note that the generation of training data may be executed by, for example, the control unit 20 or an evaluation model learning unit 21, which will be described later, or may be executed in advance by another device, but here, an example will be described in which the generation of training data is executed by the evaluation model learning unit 21.
The ideal grip data DB 14 stores ideal grip data which is an image of a desired gripping state.
Note that the ideal grip data is captured and stored in advance by an administrator or the like. In addition, an example in which the state of
The trial sample DB 15 stores a trial sample which is an example of training data used for machine learning of the detection model. For example, the trial sample DB 15 stores a trial sample in which image data, a teacher label, and a grip score are associated with each other. The trial sample stored here is used for supervised learning of the detection model.
Here, the bounding box indicates an object gripping position in the work range captured image, and has information of “x, y, h, w, θ, S, and Cn”. x and y indicate detection positions, h and w indicate sizes, θ indicates a rotation angle, S indicates a grip score, and Cn indicates a probability of belonging to a class n. The grip score indicated by S is set by a trial sample generation unit 23 to be described later. Note that the image data of the trial sample is input to the detection model after general data expansion such as slide, color conversion, and scale conversion are executed.
The learning result DB 16 stores a machine learning result of each model. For example, the learning result DB 16 stores a machine learning result of the evaluation model, a machine learning result of the detection model, and the like. Here, each machine learning result includes each optimized parameter of a neural network or the like.
The detection result DB 17 stores a detection result using the learned detection model. For example, the detection result DB 17 stores an image which is a detection object and a gripping object of the picking robot, which is an output result of the detection model, in association with each other.
The control unit 20 is a processing unit that controls the entire information processing apparatus 10 and is implemented by, for example, a processor. The control unit 20 includes the evaluation model learning unit 21, a detection model learning unit 22, and a detection execution unit 25. Note that the evaluation model learning unit 21, the detection model learning unit 22, and the detection execution unit 25 may be implemented as an electronic circuit such as a processor, or may be implemented as a process executed by the processor.
The evaluation model learning unit 21 is a processing unit that generates training data stored in the evaluation training data DB 13 and executes machine learning of the evaluation model using the training data stored in the evaluation training data DB 13.
(Generation of Training Data)
First, generation of training data will be described. For example, the evaluation model learning unit 21 inputs image data to the detection model and acquires a prediction of a gripping position. Then, the evaluation model learning unit 21 executes picking operation using the actual machine for the predicted gripping position, and acquires image data of the actual picking operation. The evaluation model learning unit 21 stores the image data of the actual picking operation collected in this way in the evaluation training data DB 13, and generates the training data for the evaluation model. Note that the training data may also be created in advance by an administrator or the like and stored in the evaluation training data DB 13.
(Machine Learning of Evaluation Model)
Next, machine learning of the evaluation model will be described. For example, the evaluation model learning unit 21 generates the evaluation model by executing, using training data stored in the evaluation training data DB 13, metric learning by using a pair of two pieces of image data as explanatory variables and a teacher label based on an image difference which is a difference between the pieces of image data as an objective variable. Then, when machine learning is completed, the evaluation model learning unit 21 stores a learning result or the learned evaluation model in the learning result DB 16. Note that the timing for ending machine learning may be optionally set, for example, when machine learning using a predetermined number of training data is executed or when a restoration error is equal to or smaller than a threshold.
Subsequently, as illustrated in
Using this feature, the evaluation model learning unit 21 inputs two pieces of image data to the SNs, acquires a feature vector corresponding to each piece of image data, and calculates a distance between the feature vectors. Then, the evaluation model learning unit 21 optimizes each parameter of the SNs by using a Contrastive Loss function indicated in Equation (1) on the basis of an error (contrastive loss) based on the distance so that a distance between the same samples is close and a distance between different samples is far.
[Mathematical Formula 1]
L=Σ½((1−y)*L++y*L−) Equation (1)
In the above example, in a case where the teacher label y (for example, 1.0) indicating that the two pieces of image data are similar is set between the two pieces of image data, the evaluation model learning unit 21 determines that the two pieces of image data are the same samples, and executes learning so that “L+=similar loss” indicated in Equation (2) becomes small. On the other hand, in a case where the teacher label y (for example, 0.0) indicating that the two pieces of image data are not similar is set between the two pieces of image data, the evaluation model learning unit 21 determines that the two pieces of image data are different samples, and executes learning so that “L−=dissimilar loss” indicated in Equation (2) becomes large.
In this way, the evaluation model learning unit 21 executes metric learning using the training data stored in the evaluation training data DB 13 to generate an evaluation model.
Returning to
(Machine Learning of Detection Model)
First, a series of flows of machine learning of the detection model will be described.
Subsequently, as illustrated in
Thereafter, as illustrated in
Then, the model learning unit 24 generates a trial sample by associating the work range captured image with the grip score, and executes machine learning of the detection model by using the trial sample. At this time, among bounding boxes in the work range captured image, a teacher label showing a correct answer may be set for a bounding box that is actually gripped, and a teacher label showing an incorrect answer may be set for other bounding boxes.
(Calculation of Grip Score)
Next, calculation of a grip score will be described. The trial sample generation unit 23 acquires a learning result of the evaluation model from the learning result DB 16, and constructs the learned evaluation model. Then, the trial sample generation unit 23 calculates a distance between an optimum gripping position and image data of an operation result by using the learned evaluation model, and sets the distance as a grip score.
Then, the trial sample generation unit 23 sets a grip score according to the Euclidean distance to a work range captured image used for generating the image data of the operation result or each bounding box in which a teacher label showing a correct answer or a teacher label showing an incorrect answer is set in the work range captured image, and generates a trial sample. In this way, the trial sample generation unit 23 generates the trial sample and stores the trial sample in the trial sample DB 15.
For example, the trial sample generation unit 23 may add a grip score to an operation result by dividing the grip score into 10 stages in a range from “1.0” to “−1.0” and associating a range of a Euclidean distance with each stage. Note that, as a feature vector of the ideal grip data, an average value of a plurality of pieces of ideal grip data may be used. In addition, since a distance between two points is [0.0, route 2], a reciprocal with a constant added so as to increase the grip score as the distance decreases is taken. In this way, by increasing a grip score of a trial sample close to a desired sample (ideal gripping), it is possible to increase feedback at the time of training of the trial sample closer to success.
(Learning of Detection Model)
The model learning unit 24 is a processing unit that executes machine learning of the detection model by using each trial sample stored in the trial sample DB 15 after a certain number of trial samples is generated.
Then, the model learning unit 24 executes machine learning of the detection model so that the estimation result and data including a bounding box in which a teacher label showing a correct answer is set in the work range captured image match.
At this time, the model learning unit 24 executes machine learning of the detection model by increasing feedback as a grip score set in the trial sample input to the detection model increases and decreasing feedback as the grip score decreases.
Here, Single Shot Multibox Detector (SSD) is used as the detection model. Before learning is started, the weight of a synapse, which is a parameter to be learned by the SSD, is initialized to a random value. The SSD is a kind of a multilayer neural network (NN), and has following features. The SSD applies a convolutional NN which is a NN specialized for learning image data, and uses image data as input, and a plurality of detection candidates (bounding boxes and reliability (class probability) of detection objects) in the input image data as outputs. The reliability is a value indicating which class a detection object belongs to among previously set classes when the detection object is classified. In a case where it is desirable that the number of classes into which detection objects are classified is N, the number of classification classes of the SSD is N+1 including a background class. At the time of detection after learning, each detection object is classified into a class indicating a value with the highest reliability. By executing learning of a previously set default bounding box to match a bounding box showing a correct answer, the SSD enables detection with high accuracy regardless of an aspect ratio or the like of the input image data.
The model learning unit 24 generates the detection model by updating such parameters of the SSD by machine learning according to a grip score.
In this way, the model learning unit 24 executes the parameter update of the detection model so that the loss function is minimized. Note that, when learning is completed, the model learning unit 24 stores a learning result or the learned detection model in the learning result DB 16. In addition, the positive example error is an error related to a gripping position class, and a negative example error (Lossneg,b) is an error related to a background class.
Returning to
In addition, the detection execution unit 25 detects a gripping position specified by the prediction result, and displays the gripping position on a display or the like or stores the gripping position in the detection result DB 17. Note that the detection execution unit 25 may also store the prediction result in the detection result DB 17.
[Learning Processing of Evaluation Model]
Subsequently, the evaluation model learning unit 21 inputs the training data for evaluation (image pair) to the evaluation model (S104), and calculates a distance between vectors which correspond to the images and are output in response to the input (S105). Then, the evaluation model learning unit 21 executes metric learning on the basis of the distance between the vectors and the teacher label, and updates the learning parameters of the SNs applied to the evaluation model (S106).
Thereafter, in a case where machine learning is continued (S107: No), the evaluation model learning unit 21 repeats S102 and subsequent steps for the next training data. On the other hand, in a case where machine learning is finished (S107: Yes), the evaluation model learning unit 21 outputs the learned evaluation model to the storage unit 12 or the like (S108).
[Learning Processing of Detection Model]
Subsequently, the detection model learning unit 22 acquires a predicted gripping position on the basis of an output of the detection model (S203), executes gripping by the actual machine with respect to the predicted gripping position, and acquires an actual machine gripping result that is image data of an actual gripping result (S204).
Then, the detection model learning unit 22 inputs ideal grip data acquired from the ideal grip data DB 14 and the actual machine gripping result by the actual machine to the learned evaluation model (S205). The detection model learning unit 22 calculates a grip score that increases as similarity increases, by using a distance between vectors that are output results of the learned evaluation model (S206).
Thereafter, the detection model learning unit 22 generates a trial sample in which the work range captured image read in S202 is associated with the grip score calculated in S206, and stores the trial sample in the trial sample DB 15 (S207). Here, the detection model learning unit 22 repeats S202 and subsequent steps for the next image until the number of trial samples reaches a prescribed number (S208: No).
On the other hand, when the number of trial samples reaches the prescribed number (S208: Yes), the detection model learning unit 22 executes machine learning of the detection model.
For example, the detection model learning unit 22 reads one trial sample from the trial sample DB 15, and inputs the trial sample to the detection model (S209). Then, the detection model learning unit 22 acquires a predicted gripping position (S210), and executes machine learning of the detection model by feedback according to the grip score so that the predicted gripping position and data including a bounding box in which a teacher label showing a correct answer is set match (S211).
Then, in a case where learning is continued (S212: No), the detection model learning unit 22 executes S209 and subsequent steps for the next image. On the other hand, in a case where learning is finished (S212: Yes), the detection model learning unit 22 outputs the learned detection model to the storage unit 12 or the like (S213).
[Effects]
As described above, the information processing apparatus 10 may reflect a distance between feature vectors of trial samples extracted by the evaluation model in machine learning by using the distance as a score of trial success or failure at the time of sequential learning. Therefore, the information processing apparatus 10 may execute machine learning in which a degree of success is rounded so that a value increases as the trial sample approaches a target trial result.
In addition, the information processing apparatus 10 may apply metric learning using similarity between trial samples as teacher data to save labelling labor for each sample acquired for each trial. Therefore, in the information processing apparatus 10, since a combination of trial samples becomes input data at the time of training, the number of training data sets may be increased as compared with normal supervised learning. In addition, the information processing apparatus 10 may train a model such that a distance between trial samples is close for the samples with the same label, and apart for the samples with different labels.
In this way, in the information processing apparatus 10, by increasing a grip score of a trial sample close to a desired sample, it is possible to increase feedback at the time of machine learning of the trial sample closer to success, and preferentially perform learning. Therefore, the information processing apparatus 10 may improve prediction accuracy.
In addition, since the information processing apparatus 10 performs machine learning on the basis of training data to which a grip score is added, it is less likely to fall into a local solution. Furthermore, in the prior art, an influence of data acquired at an initial stage of sequential learning is large. However, in the method of the first embodiment, since feedback becomes large even for data added later, an influence of data acquisition order may be reduced. Therefore, the information processing apparatus 10 may resolve instability of learning due to order in which trial samples are added to a data set, and may improve stability of machine learning (training).
In addition, the information processing apparatus 10 may omit detailed design of a threshold or the like by comparing feature values extracted by a learner from raw image data. Furthermore, since the information processing apparatus 10 may use a combination of pieces of data as training data, it is possible to create more variations even if the number of pieces of original data is the same, and to omit preparation of a large-scale data set.
Although the embodiment has been described above, the embodiment may be implemented in various different forms in addition to the embodiment described above.
[Numerical Values or the Like]
The type of target data, the distance calculation method, the learning method applied to each model, the model configuration of the neural network, and the like used in the above embodiment are merely examples and may be optionally changed. In addition, these may be used not only for prediction of a gripping position of the picking robot, but also for determination of image data in various fields such as recognition of an unauthorized person, and sound data and the like may be applied. Furthermore, a device that executes machine learning and a device that performs detection using a model after machine learning may be implemented by separate devices.
[Trial Sample]
In addition, by using only a work range captured image when gripping is successful as a trial sample, it is possible to improve accuracy in prediction of a gripping position. Furthermore, by using only each work range captured image when gripping is successful and when gripping is failed as trial samples, the number of times of training may be increased, and it is possible to suppress falling into a local solution.
[Pre-Learning]
In the above evaluation model, learning accuracy may be improved by executing pre-learning to improve accuracy in extraction of a feature value (feature vector). In addition, since similarity between two pieces of image data (image pair) is calculated by pre-learning, it is possible to set a threshold when a teacher label is set at the time of learning of the evaluation model illustrated in
On the other hand, when pre-learning is completed (S302: Yes), the information processing apparatus 10 stores a learning result and a threshold in the storage unit 12 (S303). Thereafter, the information processing apparatus 10 generates training data for evaluation as in the first embodiment (S304), executes the learning processing of the evaluation model illustrated in
[System]
Pieces of information including a processing procedure, a control procedure, a specific name, various types of data, and parameters described herein or illustrated in the drawings may be optionally changed unless otherwise specified. Note that the evaluation model is an example of a first model, and the detection model is an example of a second model. The image data in which gripping is successful is an example of image data satisfying a first condition, and the image data showing an ideal gripping state (ideal grip data) is an example of image data satisfying a second condition. The evaluation training data DB 13 is an example of a data set. In addition, the image data is not limited to two-dimensional data, and three-dimensional data may also be used.
In addition, the predicted gripping position is an example of a predicted gripping object. Furthermore, the ideal grip data is an example of a desired trial result. Note that the trial sample generation unit 23 is an example of a first calculation unit and a second calculation unit, and the model learning unit 24 is an example of an execution unit.
In addition, each component of each device illustrated in the drawings are functionally conceptual and do not necessarily have to be physically configured as illustrated in the drawings. For example, specific forms of distribution and integration of each device are not limited to those illustrated in the drawings. For example, all or a part of the devices may be configured by being functionally or physically distributed and integrated in optional units according to various types of loads, usage situations, or the like.
Furthermore, all or any part of individual processing functions performed in each device may be implemented by a central processing unit (CPU) and a program analyzed and executed by the CPU, or may be implemented as hardware by wired logic.
[Hardware]
Next, a hardware configuration example of the information processing apparatus 10 will be described.
The communication device 10a is a network interface card or the like and communicates with another server. The HDD 10b stores programs and DBs for operating the functions illustrated in
The processor 10d reads a program that executes processing similar to that of each processing unit illustrated in
As described above, the information processing apparatus 10 operates as an information processing apparatus that executes an information processing method by reading and executing the program. In addition, the information processing apparatus 10 may also implement functions similar to those of the above embodiments by reading the above program from a recording medium by a medium reading device and executing the read program described above. Note that the program referred to in other embodiments is not limited to being executed by the information processing apparatus 10. For example, the embodiments may be similarly applied to a case where another computer or server executes the program, or a case where such computer and server cooperatively execute the program.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2020-073729 | Apr 2020 | JP | national |