This application claims priority to Korean Patent Application No. 10-2021-0067628, filed on 26 May 2021, the disclosure of which is incorporated by reference herein in its entirety.
The present disclosure relates to a technology for constructing kinematic information of a robot manipulator, and more particularly to, an apparatus for constructing kinematic information of a robot manipulator from image information of the robot manipulator and a method therefor.
While most of the robot manipulators in the past were PUMA-type robots that resemble the human arm and SCARA-type robots that work on planar surfaces, various types of robots have been emerging recently, along with the fourth industrial era, including dual arm robots, surgical robots, and medical and sports-assistive robots. Moreover, customized robot manipulators (customized robot arms) are manufactured for on-site work on a small-to-medium sized business scale—however, it is difficult to adapt them for use in actual situations due to a lack of expertise.
(Patent Document 1) Korean Unexamined Patent No. 2010-0105143 (published on Sep. 29, 2010).
The present disclosure provides an apparatus for constructing kinematic information of a robot manipulator from image information of the robot manipulator and a method therefor.
An exemplary embodiment of the present disclosure provides an apparatus for constructing kinematic information of a robot manipulator, the apparatus including: a robot image acquisition part for acquiring a robot image containing shape information and coordinate information of the robot manipulator; a feature detection part for detecting the type of each of a plurality of joints of the robot manipulator and the three-dimensional coordinates of the joint using a feature detection model generated through deep learning based on the robot image containing shape information and coordinate information; and a variable derivation part for deriving Denavit-Hartenberg (DH) parameters based on the type of each of the plurality of joints of the robot manipulator and the three-dimensional coordinates of the joint.
The feature detection part may feed the robot image containing shape information and coordinate information as input into the feature detection model generated through deep learning, and the feature detection model may produce a computed value for each of a plurality of joints of the robot manipulator included in the robot image, including the type of the joint and the three-dimensional coordinates of the joint, by performing operations using learned weights on the shape information and coordinate information of the robot image.
The variable derivation part may allocate a joint coordinate system to each of the plurality of joints according to the type of the joint and derive DH parameters based on the allocated joint coordinate system and the detected three-dimensional coordinates of the joint.
The apparatus may further include a model generation part which provides a labeled robot image for training that includes the type of each of the plurality of joints of the robot manipulator and the three-dimensional coordinates of the joint, which feeds the robot image for training as input into a feature detection model with initialized parameters, which, once the feature detection model with initialized parameters produces a computed value for each of the plurality of joints of the robot manipulator, including the type of the joint and the three-dimensional coordinates of the joint, by performing operations using weights initialized from the robot image for training, produces a loss representing the difference between the computed value and the label, and which performs optimization to modify the weights of the feature detection model so as to minimize the difference between the computed value and the label.
Another exemplary embodiment of the present disclosure provides a method for constructing kinematic information of a robot manipulator, the method including: acquiring, by a robot image acquisition part, a robot image containing shape information and coordinate information of the robot manipulator; detecting, by a feature detection part, the type of each of a plurality of joints of the robot manipulator and the three-dimensional coordinates of the joint using a feature detection model generated through deep learning based on the robot image containing shape information and coordinate information; and deriving, by a variable derivation part, Denavit-Hartenberg (DH) parameters based on the type of each of the plurality of joints of the robot manipulator and the three-dimensional coordinates of the joint.
The detecting of the type of each of a plurality of joints of the robot manipulator and the three-dimensional coordinates of the joint may include: feeding, by the feature detection part, the robot image containing shape information and coordinate information as input into the feature detection model generated through deep learning; and producing, by the feature detection model, a computed value for each of the plurality of joints of the robot manipulator included in the robot image, including the type of the joint and the three-dimensional coordinates of the joint, by performing operations using learned weights on the shape information and coordinate information of the robot image.
The deriving of Denavit-Hartenberg (DH) parameters based on the type of each of the plurality of joints of the robot manipulator and the three-dimensional coordinates of the joint may include: allocating, by the variable derivation part, a joint coordinate system to each of the plurality of joints according to the type of the joint; and deriving, by the variable derivation part, DH parameters based on the allocated joint coordinate system and the detected three-dimensional coordinates of the joint.
The method may further include: prior to the acquiring of a robot image containing shape information and coordinate information of the robot manipulator, providing, by a model generation part, a labeled robot image for training that includes the type of each of the plurality of joints of the robot manipulator and the three-dimensional coordinates of the joint; feeding, by the model generation part, the robot image for training as input into a feature detection model with initialized parameters; producing, by the feature detection model with initialized parameters, a computed value for each of the plurality of joints of the robot manipulator, including the type of the joint and the three-dimensional coordinates of the joint, by performing operations using weights initialized from the robot image for training; producing, by the model generation part, a loss representing the difference between the computed value and the label; and performing, by the model generation part, optimization to modify the weights of the feature detection model so as to minimize the difference between the computed value and the label.
According to the present disclosure, it is possible to make the analysis of direct kinematics, inverse kinematics, and kinetics easier by providing kinematic information of a robot manipulator derived from image information of the robot manipulator. Accordingly, the robot manipulator may be customized to fit actual on-site situations.
Prior to the detailed description of the present disclosure, the terms and words used in this specification and claims should not be construed as being limited to their ordinary or dictionary meanings, but should be interpreted as having meanings and concepts consistent with the technical ideas of the present disclosure based on the principle that the inventor may properly define the concepts of terms in order to describe their own disclosure in the best way possible. Therefore, the configurations shown in the exemplary embodiments and drawings described in this specification are merely the most preferred embodiments of the present disclosure and are not representative of the technical ideas of the present disclosure, so that it should be understood that various equivalents and modifications may be substituted for those at the time of filing of the present application.
Hereinafter, exemplary embodiments in the present disclosure will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals will be used throughout to designate the same or like elements. Further, the detailed description of well-known functions and configurations that may obscure the gist of the present disclosure will be omitted. For the same reason, some of the components in the accompanying drawings may be exaggerated, omitted, or schematically illustrated, and the dimensions of respective components may not accurately reflect the actual sizes of the components.
First of all, a configuration of an apparatus for constructing kinematic information of a robot manipulator according to an embodiment of the present disclosure will be described.
Referring to
The model generation part 100 is used for generating a feature detection model FDM, which is a deep learning model according to an embodiment of the present disclosure, through deep learning. The model generation part 110 trains the feature detection model FDM using a robot image for training. Such a robot image for training has a label on it that includes the type of each of a plurality of joints of a robot manipulator and the three-dimensional coordinates of the joint. Accordingly, the model generation part 110 may generate a feature detection model FDM that calculates features of a robot manipulator—that is, the type of each of a plurality of joints and the three-dimensional coordinates of the joint—through deep learning using robot images for training. Such a learning method will be described in more detail below.
The image acquisition part 200 is used for acquiring a robot image containing the robot manipulator's shape information and coordinate information. That is, as depicted in
According to an embodiment, the image acquisition part 200 may include a camera module 210, and may acquire a robot image using the camera module 210. The camera module 210 is used for capturing a robot image. The camera module 210 may include an optical camera 211 and a depth rendering camera 212. The optical camera 211 and the depth rendering camera 212 capture images in sync with each other. The optical camera 211 captures an optical image made up of a plurality of pixel values, and such an optical image constitutes shape information of the robot image. Such shape information may be provided as the pixel value of each pixel of the robot image, and may include three channels of (R, G, B). The depth rendering camera 212 may capture an image that provides depth coordinates (distance), that is, a depth image, and the camera module 210 may generate and provide a robot image containing coordinate information based on the depth coordinates (distance) from the depth rendering camera 212. The camera module 210 acquires internal parameters of the optical camera 211, such as resolution, i.e., the number of pixels in height and width in an optical image captured by the optical camera 211, synchronized with a depth image captured by the depth rendering camera 212, angle of view, and focal length. The camera module 210 then calculates the direction angle of each pixel in the optical image captured by the optical camera 211 with respect to the optical axis of the optical camera 211, and calculates coordinate information of each pixel in the optical image by using the calculated direction angle of each pixel in the optical image and the depth coordinates (distance) of the depth image captured by the depth rendering camera 212. Such coordinate information may be three-dimensional coordinates with respect to the focal point of the optical camera 211.
According to another embodiment, the image acquisition part 200 may extract a robot image from a design file, without using the camera module 210. The design file contains information such as the shape and dimensions of the robot manipulator, and the image acquisition part 200 may generate shape information from the design file by converting the format of the design file and generate coordinate information by deriving coordinates from the dimensions, thereby constructing a robot image.
The feature detection part 300 is used for detecting the type of each of a plurality of joints of the robot manipulator and the three-dimensional coordinates of the joint from the robot image. The feature detection part 300 receives a robot image containing shape information and coordinate information, and detects the type of each of a plurality of joints of the robot manipulator and the three-dimensional coordinates of the joint based on the robot image.
The variable derivation part 400 is used for configuring Denavit-Hartenberg (DH) parameters based on the type of each of the plurality of joints and the three-dimensional coordinates of the joint which are detected by the feature detection part 300.
Now, the above-described feature detection model trained through deep learning so as to detect features of the robot manipulator will be described in more details.
Referring to
The convolution, downsampling and upsampling, and deconvolution operations use a kernel made up of a given matrix, and the values of the elements of the matrix constituting the kernel are used as weights (w). Also, examples of the activation function may include Sigmoid, hyperbolic tangent (tan h), exponential linear unit (ELU), rectified linear unit (ReLU), leaky ReLU, Maxout, Minout, Softmax, etc.
The feature detection model FDM includes a prediction network PN, a detection network DN, and a merge module MM.
Once a robot image is fed as input, the prediction network PN outputs a predicted value by performing a plurality of operations using the weights of a plurality of layers. That is, the prediction network PN may output a predicted value by dividing a robot image into a plurality of cells and then calculating the center coordinates (dx, dy) of each of a plurality of bounding boxes BB relative to the cell where that bounding box BB belongs, the coordinates (dx, dy, w, h) of the bounding box defining the width (w) and height (h) relative to the center coordinates (dx, dy), the confidence which indicates how likely the object is contained in that bounding box BB and how likely the object is present inside that bounding box BB, and the probability that the object inside that bounding box BB belongs to each of a plurality of classes, i.e., a revolute joint (R) class, a prismatic joint (P) class, etc. An example of the predicted value is depicted in
The detection network DN selects at least one of the plurality of bounding boxes corresponding to the predicted value and outputs it as a computed value. The detection network DN may produce the computed value by performing a plurality of operations using weights on the predicted value. For example, the detection network DN may output, as the computed value, a bounding box where the probability of the object in the plurality of bounding boxes belonging to a learned class is equal to or greater than a set threshold and the three-dimensional coordinates [(xi, yi, zi), i=1, 2, 3, . . . ] corresponding to the center coordinates (dx, dy) of that bounding box. For example, assuming that the threshold is 75%, the probability that an object within a bounding box BB of
Next, a method for generating a feature detection model FDM according to an embodiment of the present disclosure will be described.
Referring to
Next, the model generation part 110 feeds the robot image as input into the feature detection model FDM in the step S130. Then, the feature detection model FDM produces a computed value by performing a plurality of operations using weights on the robot image in the step S140. Here, the computed value includes a bounding box BB containing a class to be learned, for example, a revolute (R) class and a prismatic (P) class, and three-dimensional coordinates mapped to the center (dx, dy) of that bounding box BB.
Subsequently, the model generation part 110 calculates a loss representing the difference between the computed value and the label by using a loss function in the step S150. Here, the loss includes a shape loss and a spatial coordinate loss. The model generation part 110 may obtain the shape loss by using the loss function in the following Equation 1.
where S represents the number of cells, and B represents the number of bounding boxes in a cell. dx and dy represent the center coordinates of a bounding box, and w and h represent the width and height of the bounding box. C represents confidence score. Pi(c) represents the probability of an object in an ith cell belonging to a corresponding class (c). Here, i is an index representing a cell where an object is present, and j is an index representing a predicted bounding box. ωcoord is used to reflect a higher value for a variable of the bounding box, which is a parameter for balancing loss and other losses with the coordinates (dx, dy, w, h) of the bounding box. ωnoobj is used to reflect a higher value for a variable of the bounding box and a lower value for a variable of an area where the object is not present. That is, ωnoobj is a parameter for balancing between bounding boxes with or without objects. 1iobj represents that the object is in a cell. 1ijobj represents the jth bounding box is in cell i.
The first and second terms of Equation 1 are used to calculate a coordinate loss representing the difference between the coordinates (x, y, w, h) of a bonding box and the coordinates of an area where an object of a class to be learned is present. Also, the third term of Equation 1 is used to calculate a confidence loss representing the difference between the area of the bounding box B and a ground-truth box which is 100% likely to contain the object. Lastly, the last term of Equation 1 is used to calculate a classification loss representing the difference between the class of the object in the bounding box B and the actual class of the object.
Moreover, the model generation part 110 may obtain the spatial coordinate loss by the loss function of the following Equation 2.
wherein x, y, and z represent three-dimensional coordinates corresponding to the center coordinates (dx, dy) of the bounding box. The other parameters in Equation 2 are the same as in Equation 1.
Next, the model generation part 110 performs optimization to modify the weights of the feature detection model FDM including the prediction network PN and the detection network DN so as to minimize every loss derived through the loss function in the step S160. That is, the model generation part 110 performs optimization to modify the weights of the feature detection model FDM so as to minimize the shape loss including the coordinate loss, confidence loss, and classification loss derived through the loss function of Equation 1 and the spatial coordinate loss derived through the loss function of Equation 2.
In the above-described steps S120 to S160, the weights (w) of the feature detection model FDM are repeatedly updated by using a plurality of different robot images CH1 and CH2 for training or a batch of a plurality of different robot images CH1 and CH2 for training. Such repetition occurs until the three losses, that is, the shape loss, the spatial coordinate loss, and the total loss including the shape loss and the spatial coordinate loss, are at or below a preset target value. Accordingly, the model generation part 110 determines whether the three losses, that is, the shape loss, the spatial coordinate loss, and the total loss including the shape loss and the spatial coordinate loss, are at or below a preset target value in the step S170, and if the shape loss, the spatial coordinate loss, and the total loss are at or below the preset target value, finishes learning in the step S180.
Next, a method for constructing kinematic information of a robot manipulator according to an embodiment of the present disclosure will be described.
Referring to
At this time, the feature detection part 300 may set reference coordinates in the robot image in the step S220. According to an embodiment, reference coordinates (n1) [0, 0, 0] may be set in response to a user input, as depicted in
Next, the feature detection part 300 detects the type of each of a plurality of joints of the robot manipulator and the three-dimensional coordinates of the joint by using a feature detection model generated through deep learning based on the robot image containing the shape information and the coordinate information in the step S230. More specifically, in the step S230, once the feature detection part 300 feeds the robot image containing the shape information and the coordinate information as input into the feature detection model generated through deep learning, the feature detection model FDM produces a computed value for each of a plurality of joints of the robot manipulator included in the robot image, including the type of the joint and the three-dimensional coordinates of the joint, by performing operations using learned weights on the shape information and coordinate information of the robot image.
Next, the variable derivation part 300 allocates a joint coordinate system to each of the plurality of joints according to the type of the joint with the computed value in the step S240. The variable derivation part 300 then derives DH parameters, i.e., a(i), d(i), α(i), and θ(i), according to the DH notation based on the allocated joint coordinate system and the detected three-dimensional coordinates of the joint in the step S250. Here, the variable inside the parentheses is a subscript. The variable derivation part 300 may find out the joint coordinate system and the detected three-dimensional coordinates of the joint and therefore obtain the distance and angle as required based on the joint coordinate system and the detected three-dimensional coordinates of the joint.
For example, as depicted in
Meanwhile, since the derived DH parameters may have error, the feature detection part 300 may optionally perform DH parameter modification and robot kinematic calibration in response to user input, by using a detailed specification of the designed robot manipulator in the step S260.
The variable derivation part 300 feeds the above derived DH parameters, i.e., a(i), d(i), α(i), and θ(i), as input into a robot controller in the step S270.
According to the present disclosure described above, a user of a robot manipulator is able to easily get kinematic information of various types of manipulators, even without expert knowledge. Moreover, the user is able to extract and modify DH parameters required for precise control of the manipulator. These DH parameters may be applied to robot controllers and make the analysis of direct kinematics, inverse kinematics, and kinetics easier. Therefore, users involved in developing general manufacturing equipment in the field of manufacturing are also able to design a desired type of robot manipulator and easily customize it to actual on-site situations without expert knowledge.
In the embodiment of
The processor TN100 may execute program commands stored in at least one of the memory TN130 and the storage device TN140. The processor TN110 may refer to a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to exemplary embodiments of the present disclosure are performed. The processor TN110 may be configured to implement the procedures, functions, methods, etc. described in connection with an exemplary embodiment of the present disclosure. The processor TN110 may control each component of the computing device TN100.
Each of the memory TN130 and the storage device TN140 may store various information related to the operation of the processor TN110. Each of the memory TN130 and the storage device TN140 may be composed of at least one of a volatile storage medium and a nonvolatile storage medium. For example, the memory TN130 may be composed of at least one of a read only memory (ROM) and a random access memory (RAM).
The transceiver TN120 may transmit or receive a wired signal or a wireless signal. The transceiver TN120 may be connected to a network to perform communication.
Meanwhile, various methods according to an exemplary embodiment described above may be implemented in the form of a readable program through various computer means to be recorded in a computer-readable recording medium. Here, the recording medium may include program commands, data files, data structures, etc. alone or in combination. The program commands recorded on the recording medium may be those specially designed and configured for an exemplary embodiment or may also be known and available to those skilled in the art of computer software. For example, the recording medium may include magnetic media such as a hard disk, a floppy disk, and a magnetic tape, optical media such as CD-ROM and DVD, magneto-optical media such as a floptical disk, and hardware devices, such as ROM, RAM, and flash memory, that are specially configured to store and execute the program commands. Examples of the program commands may include a high-level programming language that may be executed by a computer using an interpreter as well as a machine language code made by a compiler. Such a hardware device may be configured to operate as one or more software modules in order to perform an operation of the present disclosure, and vice versa.
Although the present disclosure has been described with reference to several exemplary embodiments, these embodiments are merely illustrative and should not be considered as limiting. As such, it will be understood by those skilled in the art that various changes and modifications may be made by the doctrine of equivalents without departing from the spirit of the present disclosure and the scope of rights presented in the claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0067628 | May 2021 | KR | national |