The present disclosure relates generally to control systems for mechanical manipulators, and more particularly to a system and a method for controlling an underactuated mechanical system having multiple degrees of freedom and different kinds of actuators.
A mechanical system, e.g., a robotic manipulator, is configured to track a reference trajectory for performing a task. The task, for example, corresponds to moving an object to a target position, or an assembly operation. To control the mechanical system to track the reference trajectory, a model of dynamics of the mechanical system is utilized. The model of dynamics is a mathematical model that includes equations which define dynamics of the mechanical system. Some approaches use a model of inverse dynamics of the mechanical system to control the mechanical system as the model of inverse dynamics enables controlling of the mechanical system. The model of inverse dynamics expresses joint torques as a function of joint positions, velocities and accelerations. It is desired to formulate an accurate model of the inverse dynamics for controlling the mechanical system.
Under-actuated robots (UR) are an important class of mechanical systems. UR are mechanical systems characterized by fewer control inputs than degrees of freedom (DOF). Such systems are ubiquitous in robotics: examples are manipulators with passive joints, autonomous bicycles and motorcycles, bipedal robots, and most of the aerospace and marine vehicles. For instance, several Reinforcement Learning algorithms have been applied to the UR control problem. These algorithms aim at automatically learning a control law. However, such methods typically require a huge amount of interactions with the system and do not provide any performance guarantees. Even control strategies which combine model learning with classic model-based control methods rely on the inverse dynamics model, which relates torques to the robot trajectories. However, learning the inverse dynamics model is a challenging and to a great extent unexplored task for control of UR systems.
Some example embodiments are based on the realization that for effective control of mechanical systems such as robots, an accurate model of the dynamics of such a mechanical system is required. The model of dynamics is a mathematical model including equations which define dynamics of the mechanical system. The model of dynamics may be a model of forward dynamics that connects control commands and current states of the different actuators to transitioned states of the different actuators achieved by executing the control commands. Alternately, the model of dynamics may be a model of inverse dynamics that maps the current states and the transitioned states of the different actuators to corresponding torques for the different actuators. For efficient control of such systems, one approach is to use the model of inverse dynamics of the mechanical system since the model of inverse dynamics enables controlling of the mechanical system. The model of inverse dynamics expresses joint torques as a function of joint positions, velocities and accelerations. It is an object of some embodiments to control a mechanical system using a model of inverse dynamics. In this regard, it is also an objective of some embodiments to derive an accurate inverse dynamics model of the mechanical system. The mechanical system may be a robotic system having different actuators and multiple degrees of freedom to track a reference trajectory for performing a task. For example, the robotic system may be a manipulator configured to track the reference trajectory for performing a task of moving an object to a target location. The robotic manipulator includes joints and links. Each joint is actuated by an actuator such as an electric motor. A motion given by the actuator makes the link attached to the joint move.
Generally, an accurate physical model of the inverse dynamics of such mechanical systems is difficult and time-consuming to generate. Conventional model-based approaches which derive parametric models directly from first principles of physics are often limited in performance by both the presence of parametric uncertainty and the inability to describe certain complex dynamics typical of real systems, such as motor friction or joint elasticity. Accordingly, it is advantageous to utilize machine learning-based approaches for deriving inverse dynamics models of such systems. Some machine learning-based approaches in this regard are mainly based on deep neural networks (NN) and Gaussian Process Regression (GPR). Some embodiments realized that in this context, both gray-box and black-box approaches may be suitable. Within gray-box techniques, a model-based component encoding the known dynamics may be combined with a data-driven one, which can compensate for modeling errors and unknown dynamical effects.
However, it is a realization of some embodiments that the performance of these methods strongly depends on the effectiveness of the model-based component, so they still require to derive sufficiently accurate physical models, which might be particularly time-consuming and complex if some parameters are unknown or not known precisely. In contrast, pure black-box methods learn inverse dynamics models directly from experimental data, without requiring deep knowledge of the underlying physical system. Despite their ability to approximate even complex non-linear dynamics, pure black-box methods typically suffer from low data efficiency and poor generalization properties: learned models require a large amount of samples to be trained and extrapolate only within a neighborhood of the training trajectories.
Some embodiments realized that to overcome the aforementioned limitations of black-box techniques in the context of NN and for the GPR framework, a promising class of solutions is represented by Physics Informed Learning (PIL), which proposes to embed insights from physics as a prior in black-box models. Instead of learning the inverse dynamics in an unstructured manner, which makes the problem unnecessarily complex, some embodiments embed physical properties in the model to improve generalization and data efficiency. Accordingly, some embodiments provide a PIL model for inverse dynamics identification of mechanical systems based on GPR. A standard approach for applying GPR to the inverse dynamics identification involves modeling each joint torque directly with a distinct Gaussian Process (GP), assuming the GPs independent of one another given the current joint position, velocity, and acceleration. However, some embodiments realized that such single-output approaches ignore the correlations between the different joint torques imposed by the Lagrangian equations, which in turn limits generalization and data efficiency.
Inspired from these realizations, some embodiments provide a multi-output GPR estimator based on a novel kernel function referred to as a Lagrangian Inspired Polynomial kernel (LIP), which exploits Lagrangian mechanics to model the correlations between the different joint torques, instead of modelling each joint torque with a distinct Gaussian Process (GP), assuming the GPs independent of one another given the current joint position, velocity, and acceleration. Some embodiments exploit the fact that the dynamics equations are linear with respect to the Lagrangian, to obtain the Gaussian Processes (GPS) of the torques by applying a set of linear operators to the GPs of the potential and kinetic energy of the mechanical system. Some embodiments recognize that the kinetic and potential energy are polynomial functions in a suitable input space and derive a polynomial kernel that encodes this property.
Accordingly, some example embodiments derive the LIP estimator as a black-box multi-output GPR model which encodes the symmetries typical of Lagrangian systems. The LIP model estimates the kinetic and potential energy in a principled way, allowing its integration with energy-based control strategies. Since the LIP estimator encodes physical properties in the model, the LIP estimator outperforms state-of-the-art black-box GP estimators as well as NN-based solutions, obtaining better data efficiency and generalization performance.
Some embodiments are also based on the realization that training the model of inverse dynamics is difficult because of lack of structure of the model of inverse dynamics, which fail to describe physical principles that govern dynamics of the multiple degrees of freedom of the mechanical system. For example, the robotic manipulator may have joints that define the multiple degrees of freedom, and dynamics of the joints are correlated to each other. Therefore, there exists a complex and potentially non-linear correlation between torques for the joints needed to track the reference trajectory. Such a correlation is challenging to learn through training. Also, for some special types of mechanical systems such as underactuated robots which have fewer actuators than degrees of freedom, learning inverse dynamics models for the control of such robots is particularly challenging because under actuation further exacerbates the aforementioned problems. For example, torques of the underactuated dimensions are constant signals equal to zero, leading to an ill-posed estimation problem.
Towards this end, some embodiments are directed towards physics-informed model-based solutions for controlling such systems. Some embodiments are based on the recognition that Gaussian Processes Regression (GPR) can be used to learn the correlation between torques of the joints. For example, several Gaussian processes-based solutions use GPR to model n torque components; one for each degree of freedom with n independent Gaussian Process (GP) and include a model-based component on a mean function or a covariance. As a result, a covariance matrix for such a GPR model is either diagonal or block diagonal to represent that the correlation between torques of the different actuators is not captured.
Some embodiments are based on the realization that such a deficiency is caused, at least in part, by an attempt to model the torque itself as a Gaussian process. However, some embodiments are based on the realization that the Gaussian process can be designed to model kinetic and potential energy of the mechanical system. In contrast with modeling individual torques, modeling the energy captures mutual effects of the torques of the different actuators on each other, which in turn allows to learn the correlation among the torques of the different actuators. As a result, the covariance matrix capturing correlations between the torques of the different actuators is a full matrix with non-zero elements inside and outside of the diagonal.
Towards this end, some embodiments provide an inverse dynamics model that models energy of the mechanical system with a GPR having a full prior and posterior covariance matrix that captures the correlations between the torques of the different actuators. The inverse dynamics model is trained with machine learning to map the dynamic states of the different actuators and joints to corresponding torques for the different actuators. According to an embodiment, dynamic states of the different actuators and joints are processed with the inverse dynamics model to produce values of the torques for the different actuators of the mechanical system and estimate values of the potential and kinetic energy of the mechanical system. Further, the mechanical system is controlled based on the values of the torques and the values of the potential and kinetic energy of the mechanical system. For instance, control commands are determined based on the values of the torques for the different actuators and the values of the potential and kinetic energy of the mechanical system. Further, the determined control commands are applied to the different actuators to control the mechanical system.
Some example embodiments are directed towards model and control of underactuated robots with energy-based techniques. In this regard, some example embodiments provide techniques for modelling and synthesis of energy-based controllers and control methods for UR systems. Some embodiments provide a Lagrangian Inspired Polynomial (LIP) estimator as a black-box estimator based on Gaussian Process Regression. The LIP estimator relies on a multidimensional, multi-output kernel that embeds the structure of the Euler-Lagrange equation. According to some embodiments, the LIP estimator learns various components of the inverse dynamics map, as well as the kinetic and potential energies of the UR system. Some embodiments utilize the LIP estimator to estimate values of kinetic and potential energies of the UR system, as well as the inertial, Coriolis, and gravity components directly from the overall torque measures. Some embodiments further utilize these properties to derive an energy-based controller for the stabilization and control of complex robots such as UR. The energy-based controller performs a partial feedback linearization on the actuated system and a regulation of the energy to steer the non-actuated system to a trajectory passing through the unstable equilibrium. Once the system is sufficiently close to the target, the control is switched to a Linear Quadratic Regulator (LQR) controller. The LIP model is suitable to implement this kind of controller since it returns the inertia matrix, the Coriolis and gravity torques, energy estimates, and the linearization of the system dynamics required by the LQR.
In order to realize the aforementioned objectives and advantages, various embodiments of this disclosure provide feedback controllers, feedback control methods and systems for controlling a mechanical system such as an underactuated robot in accordance with an inverse dynamics model of the system that is learned through machine learning techniques involving GPR.
According to some embodiments, a feedback controller for controlling a mechanical system to perform a task or to track a reference trajectory for performing a task is provided. The mechanical system has multiple degrees of freedom and comprises a plurality of actuators. The feedback controller comprises a memory configured to store an energy-based inverse dynamics model and computer program instructions and a processor configured to execute the instructions for controlling the mechanical system. The energy-based inverse dynamics model is trained with machine learning to map dynamic states of the mechanical system to corresponding torques for the plurality of actuators. In this regard, the energy-based inverse dynamics model is configured to model potential and kinetic energy of the mechanical system as Gaussian Processes of the dynamic states and derive Gaussian Processes for the torques from the Gaussian Processes of the dynamic states of the mechanical system based on physics of relationship between the torques and the potential and kinetic energy of the mechanical system. The processor executes the instructions to collect a feedback signal of an operation of the mechanical system, the feedback signal including current states of dynamics of the mechanical system indicative of a position, a velocity, and an acceleration of each joint of the plurality of joints of the mechanical system. The processor is further configured to process the current states of dynamics with the energy-based inverse dynamics model to produce values of the torques for the plurality of actuators and values of the potential and kinetic energy of the mechanical system. The processor controls the mechanical system based on the produced values of the torques for the plurality of actuators of the mechanical system and the values of the potential and kinetic energy of the mechanical system.
According to some other embodiments, a method for controlling a mechanical system to track a reference trajectory for performing a task is provided. The mechanical system has multiple degrees of freedom and comprises a plurality of actuators. The method utilizes an energy-based inverse dynamics model trained with machine learning to map dynamic states of the mechanical system to corresponding torques for the plurality of actuators. In this regard, the energy-based inverse dynamics model is configured to model potential and kinetic energy of the mechanical system as Gaussian Processes of the dynamic states and derive Gaussian Processes for the torques from the Gaussian Processes of the dynamic states of the mechanical system based on physics of relationship between the torques and the potential and kinetic energy of the mechanical system. The method comprises collecting a feedback signal of an operation of the mechanical system, the feedback signal including current states of dynamics of the mechanical system indicative of a position, a velocity, and an acceleration of each joint of the plurality of joints of the mechanical system. The method further comprises processing the current states of dynamics with the energy-based inverse dynamics model to produce values of the torques for the plurality of actuators and values of the potential and kinetic energy of the mechanical system. The method further comprises controlling the mechanical system based on the produced values of the torques for the plurality of actuators of the mechanical system and the values of the potential and kinetic energy of the mechanical system.
The presently disclosed embodiments will be further explained with reference to the following drawings. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the presently disclosed embodiments.
While the above-identified drawings set forth presently disclosed embodiments, other embodiments are also contemplated, as noted in the discussion. This disclosure presents illustrative embodiments by way of representation and not limitation. Numerous other modifications and embodiments can be devised by those skilled in the art which fall within the scope and spirit of the principles of the presently disclosed embodiments.
The following description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing one or more exemplary embodiments. Contemplated are various changes that may be made in the function and arrangement of elements without departing from the spirit and scope of the subject matter disclosed as set forth in the appended claims.
Specific details are given in the following description to provide a thorough understanding of the embodiments. However, understood by one of ordinary skill in the art can be that the embodiments may be practiced without these specific details. For example, systems, processes, and other elements in the subject matter disclosed may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known processes, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments. Further, like-reference numbers and designations in the various drawings may indicate like elements.
Also, individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process may be terminated when its operations are completed but may have additional steps not discussed or included in a figure. Furthermore, not all operations in any particularly described process may occur in all embodiments. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, the function's termination can correspond to a return of the function to the calling function or the main function.
Furthermore, embodiments of the subject matter disclosed may be implemented, at least in part, either manually or automatically. Manual or automatic implementations may be executed, or at least assisted, through the use of machines, hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine-readable medium. A processor(s) may perform the necessary tasks.
According to some embodiments, the controller 101 controls the mechanical system 109 to track a reference trajectory for performing a task. The mechanical system 109 has multiple degrees of freedom (DoF) and comprises a plurality of joints that can are movable. In this regard, the mechanical system 109 comprises a plurality of actuators and one or more joints of the system 109 are movable by one or more of the actuators. According to some embodiments, the mechanical system 109 may be an underactuated system such as an underactuated robot where at least one joint of the underactuated system is not directly movable by any actuator. “Underactuated” in the present disclosure may correspond to a jointed mechanism in which not all of the joints of the mechanism are actuated, i.e., some of the joints are passive or unactuated. An under-actuated robot means a robot whose number of degrees of freedom of motion, which can be directly controlled by the mounted actuator, is less than the degree of freedom of motion of the robot. According to some embodiments, the mechanical system 109 may have all joints directly movable by at least one actuator. However, in some special applications, the mechanical system 109 may be converted into an underactuated system. For example, when at least one actuator associated with at least one joint does not move the corresponding joint, the system 109 may operate as an underactuated system. Such scenarios may arise when one or more actuators are deliberately not fired but the corresponding joint is intended to be moved for performing a task. Alternately or additionally, such scenarios may also arise when one or more actuators malfunction due to operational reasons or due to breakdown in the underlying control system or communication system of such an actuator.
The controller 101 may also be coupled to suitable interfaces to collect a feedback signal for an operation of the mechanical system 109. The feedback signal may include current state of dynamics of the mechanical system 109. The current state of dynamics of the system 109 may be indicative of a position, a velocity, and an acceleration of each joint of the plurality of joints of the mechanical system 109. In this regard, the controller 101 may interface with suitable sensing circuitry that provides measures of the current state of dynamics of the system 109. In some embodiments, the controller 101 may compute the current state of dynamics of the system 109 from one or more observations pertaining to the operation of the mechanical system 109.
The energy based inverse dynamics model 107 defines a mathematical model of the inverse dynamics of the mechanical system 109. The model 107 maps the current states and the transitioned states of the different actuators to corresponding torques for the different actuators. The model 107 expresses joint torques as a function of joint positions, velocities and accelerations. Some embodiments utilize machine learning-based approaches for deriving the energy-based inverse dynamics model 107 of the system 109. The model 107 embeds physical properties between torques of the actuators in the model to improve generalization and data efficiency. In this regard, some embodiments provide the model 107 as a physics-informed-learned model for inverse dynamics identification of the mechanical system 109 based on Gaussian Process Regression (GPR). To model the correlations between the different joint torques, some embodiments realize the energy based inverse dynamics model 107 as a multi-output GPR estimator based on a novel kernel function referred to as a Lagrangian Inspired Polynomial kernel (LIP) which exploits Lagrangian mechanics to model the correlations between the different joint torques, in addition to modelling each joint torque with a distinct Gaussian Process (GP), assuming the GPs independent of one another given the current joint position, velocity, and acceleration. Details of the structure and training of the model 107 are described later in the disclosure.
The controller 101 processes the feedback signal to produce values of torques for the different actuators of the system 107. In some embodiments, the produced values of the torques may be translated into control commands specifying values of currents, voltages or other physical parameters of the actuators. The control commands may be applied to the different actuators. The control commands cause the different actuators to change their states to track the reference trajectory. The changed states of the different actuators may be input to a feedback controller 111. Based on the changed states of the different actuators, the feedback controller 111 provides a correction to the values of the torques to compensate for errors in the positions and the velocities of the joints to accurately track the reference trajectory.
Using the values of the torques and the values of the kinetic energy and potential energy of the system 109, control commands may be generated to control 156 the mechanical system 109. For example, the determined control commands may be applied to the different actuators. The control commands cause the different actuators to change their states to track the reference trajectory. The changed states of the different actuators may be input to the feedback controller 111. Based on the changed states of the different actuators, the feedback controller 111 may be configured to provide a correction to the values of the torques to compensate for errors in the positions and the velocities of the joints to accurately track the reference trajectory.
In an embodiment, the mechanical system 109 corresponds to a robotic manipulator having different actuators and multiple degrees of freedom to track a reference trajectory for performing a task. An example robotic manipulator is described next with reference to
Further, the different actuators are equipped with position sensors (such as encoders) that can measure current positions of the joints. In some embodiments, states of the different actuators are defined as positions of the joints. In some other embodiments, the states of the different actuators are defined as a combination of the positions of the joints, velocities of the joints, and/or accelerations of the joints.
According to some alternate embodiments, the manipulator 200 may be an underactuated manipulator where at least one joint of the manipulator 200 is only indirectly movable by the actuators of the manipulator. Such a joint may be referred to as an unactuated joint. For example, the unactuated joint may not be associated with a corresponding actuator(s). Alternately, the unactuated joint may have a corresponding actuator that has malfunctioned due to operational reasons. The absence of an actuator for one or more joints or presence of an unactuated joint in the robotic manipulator operationally makes it an underactuated mechanical system that is characterized by fewer control inputs than degrees of freedom. Such systems are ubiquitous in robotics: examples are manipulators with passive joints, autonomous bicycles and motorcycles, bipedal robots, and most of the aerospace and marine vehicles.
For such underactuated robotic systems, learning inverse dynamics models for the control of such robots is particularly challenging. For example, torques of the underactuated dimensions are constant signals equal to zero, leading to an ill-posed estimation problem. It is a realization of several embodiments that Gaussian Processes Regression (GPR) may be used to learn the correlation between torques of the joints. In this regard, some embodiments realize that the Gaussian process can be designed to model energy of the mechanical system. In contrast with modeling individual torques, modeling the energy captures mutual effects of the torques of the different actuators on each other, which in turn allows to learn the correlation among the torques of the different actuators. As a result, the covariance matrix capturing correlations between the torques of the different actuators is a full matrix with non-zero elements inside and outside of the diagonal.
To that end, at block 301 of the method 300, a Lagrangian function that describes the mechanical system 109 in terms of energies of the mechanical system 109, i.e., kinetic energy and potential energy, is defined. For instance, the Lagrangian function, such as (q, {dot over (q)}), is defined as a difference between the kinetic energy and the potential energy of the mechanical system 109
At block 303 of the method 300, the kinetic energy (q, {dot over (q)}) and the potential energy (q) are defined as two independent zero-mean Gaussian Processes (GPS) with covariance determined by kernel functions (x, x′) and (x, x′)
At block 305 of the method 300, a Lagrangian operator Gi that maps the Lagrangian function of the mechanical system 109 to the torques τi of the different actuators, is defined by a set of partial differential equations as
Some embodiments define the model of as a GP since and are two independent GPs. This is because the sum of two independent GPs is a GP, and its kernel is a sum of kernels, namely,
Further, applying property of GPs and linear operators to τ=(x), the inverse dynamics is modeled as τ˜GP(0, kτ(x, x′)) a zero mean GP, with covariance function kτ(x, x′)) named Lagrangian polynomial kernel.
To this end, at block 307 of the method 300, the Lagrangian polynomial kernel that defines the inverse dynamics model 107 is computed based on the Lagrangian operator, the kernel function of the kinetic energy and the kernel function of the potential energy as
Therefore, some embodiments of the present disclosure model the inverse dynamics function as an unknown multi-input multi-output function f(x): R3n→Rn.
Some embodiments are based on the realization that the kernel functions and used to define the prior on the potential energy and on the kinetic energy can be formulated as polynomial functions in a space defined by a trigonometric transformation of the state of the mechanical system 109. Such a formulation gives a physics informed structure to the kernel functions with a compact number of parameters to be learned.
Therefore, at block 309, of the kinetic energy and the kernel function of the potential energy are characterized as polynomial functions. The kernel functions and are defined based on two propositions that characterize and as polynomial functions in a space defined by a trigonometric transformation of the state of the mechanical system 109. The state of the mechanical system 109 may include the positions and velocities of the joints of the mechanical system 109. The trigonometric transformation is defined as follows.
Let qi, {dot over (q)}i be vectors including the positions and the velocities of the joints up to index i, respectively:
Nr and Np are, respectively, a number of revolute and prismatic joints, with Nr+Np=n. Sets Ir={r1, . . . , rN
qcb, qsb and qp b denote b-th element of qc, qs and qp, respectively.
Next, let Iri (resp. Ipi) be a subset of Ir (resp. Ip) composed by indexes lower or equal to i and the vectors qci, qsi, (resp. qp) are defined as restriction of qc, qs (resp. qp) to Iri (resp. Ipi). For the sake of clarity, consider the following example. Let index i be such that rj≤i<rj+1 for some 1≤j<rN
According to an embodiment, the kernel function of the potential energy is a polynomial function in a space defined by a trigonometric transformation of the state of the mechanical system 109. The following proposition establishes that the potential energy is polynomial with respect to a set of variables =(qc, qs, qp) that are functions of the joint positions vector q. is the trigonometric transformation of the state of the mechanical system 109 that is the space over which the potential energy is a polynomial function.
Proposition-1: Consider a manipulator with n+1 links and n joints. Total potential energy (q) belongs to space (n)(qc
To comply with constraints on the maximum degree of each term, the kernel function (x, x′) is defined as a product of Nr+Np inhomogeneous polynomial kernels where Nr kernels have p=1 and each of them is defined on the 2-dimensional input space given by qcs
Each of n kernels accounts for contribution of a distinct joint, and the potential energy kernel (x, x′) spans (n)(qc
Further, some embodiments are based on the observation that total kinetic energy is a sum of the kinetic energies relative to each link, that is,
Proposition-2: Consider a manipulator with n+1 links and n joints. The kinetic energy (q, {dot over (q)}) of link i belongs to (2i+2)(qc
To comply with constraints and properties stated in the above proposition, a kernel function given by a product of i inhomogeneous polynomial kernels and 1 homogeneous kernel is adopted, where
Resulting kernel function for the kinetic energy is then given by
At block 311 of the method 300, the Lagrangian polynomial kernel, kτ, that defines the inverse dynamics model 107 is computed based on the Lagrangian operator , the kernel function of the potential energy (eqn. A) and the kernel function of the kinetic energy (eqn. B) characterized as the polynomial functions
In an embodiment, input of the energy-based inverse dynamics model 107 during training is the data q, {dot over (q)}, {umlaut over (q)} 405 and labels are the torques data 406. The one or more hyperparameters can be trained based on any machine learning algorithm. In some embodiments, the one or more hyperparameters are learned with a machine learning algorithm using maximization of marginal likelihood. The one or more hyperparameters are the parameters that define the kernel function of the Gaussian process. For example, in a standard squared exponential kernel the one or more hyperparameters may be scaling factors and length scales of the squared exponential function. In some embodiments, the one or more hyperparameters may be coefficients of the polynomial functions that define the Lagrangian kernel (x, x′).
According to some embodiments, the inverse dynamics model 107 is a multi-input-multi-output (MIMO) torque estimator model that produces the torques for the different actuators based on the positions of the joints of the mechanical system 109, and velocity and acceleration of the joints providing multiple degrees of freedom. In other words, the positions of the joints, and velocity and acceleration of the multiple degrees of freedom are applied as input to the MIMO torque estimator, and the MIMO torque estimator outputs the torques for the different actuators.
Additionally, or alternatively, in some embodiments, the inverse dynamics model 107 is used to estimate the kinetic energy and the potential energy from torque measurements. For instance, the processor 103 may be configured to process the states of the different actuators with the trained inverse dynamics model 107 to estimate the kinetic energy of the mechanical system 109 and the potential energy of the mechanical system 109. According to some embodiments, after the training of the inverse dynamics 107, the estimate 409 of kinetic energy and the estimate 410 of potential energy may be computed without requiring further training for such computation of the energies. That is, the training of the model 107 does not require labeled data for the kinetic and/or potential energy. Indeed, the hyperparameters of the torque kernel kθτ are the hyperparameters of the Lagrangian Kernel (x, x′), that is the sum of the potential energy kernel and of the kinetic kernel as described previously in this disclosure. According to some embodiments, the potential energy can be computed by combining the quantity 407 obtained during the estimation of the inverse dynamics of the system, the potential kernel in 408 and the linear operator that maps the Lagrangian function to torques determined at step 305 of
Consider a mechanical system with n-DOF and let q∈ be the vector of generalized coordinates. The system may have m control inputs (where m<n), each of which actuates a single DOF. The vector q∈ may be partitioned as qT=[q1T, q2T], where q1∈ and q2∈ refer respectively to the actuated and the non-actuated DOFs. Under the rigid body assumption, the inverse dynamics of the underactuated system can be derived from the Euler-Lagrange equations as:
The inverse dynamics identification problem consists of estimating the map in eq. (1) that relates {tilde over (x)}=(q, {dot over (q)}, {umlaut over (q)}) and the torques τ from a set of noisy measures. Black-box solutions treat the inverse dynamics as an unknown function and, generally, rely on universal approximators to estimate the function from experimental data. According to some embodiments, GPR, which is a framework for Bayesian inference widely used in machine learning and robotics, may be adopted in this regard.
The standard approach when using GPR consists of considering the different torque components independently and solving n independent regression problems, one for each generalized coordinate. However, with underactuated systems, torques of the under-actuated dimensions are constant signals equal to zero, leading to an ill-posed estimation problem. This black-box setup prevents the possibility of deriving any inverse dynamics model useful for model-based control strategies.
A particular class of robots described by eq. (1), are known as balancing systems. Common examples of balancing systems are the Cartpole, the Furuta Pendulum, the Acrobot, and the Pendubot. Within such systems, the typical control challenge requires swinging up and balancing the robot in the unstable equilibrium point, hereafter denoted by x★=[q★T, {dot over (q)}★T]T with {dot over (q)}★=0.
The first step consists of a partial feedback linearization. From eq. (1) the dynamics of the actuated and non-actuated subsystems are isolated respectively as
Eq. (3) may be solved for {umlaut over (q)}2 as {umlaut over (q)}2=−M22−1(M21{umlaut over (q)}1+c2+g2) since M22 is invertible (given that M>0). Substituting the resulting expression into eq. (2) leads to
A feedback linearizing controller for eq. (4) can be defined as
A linear second-order dynamics for the actuated subsystem may be obtained. Selecting u according to
is the kinetic energy, while (q) denotes the potential energy. However, the choice of fe depends on the system of interest.
It may be noted that the controller presented in this section is not stabilizing the system to a fixed point but only to a manifold. For this reason, in applications such as the swing up of balancing robots, the control must switch to another controller achieving local asymptotic stability to the equilibrium.
To stabilize the system at the equilibrium x★ we resort to a Linear Quadratic Regulator (LQR). First, a state space description for the system may be provided with dynamics expressed in eq. (1). Let the system state x be such that xT=[qT, {dot over (q)}T]. The state evolution can be derived from eq. (1) as
Then the non-linear system in eq. (10) may be linearized around x★T. Moreover, let τ★ be the reference input at the equilibrium. Applying a first-order Taylor expansion, the system dynamics around (x★, τ★) can be approximated as
Recalling that at the equilibrium {dot over (q)}★=0 matrices A and B are
Then, the infinite horizon control problem on the linearized system is, namely,
The LIP estimator is based on GP, which is a framework for Bayesian inference widely used in machine learning and robotics applications. Generally, GPR solutions for inverse dynamics identification model each torque component τi({tilde over (x)}) as a Gaussian Process (GP) by assuming τi({tilde over (x)})s are independent given {tilde over (x)}, and then apply standard GPR inference. As discussed in the previous section, this approach is not effective for underactuated systems. The LIP estimator follows an alternative strategy and defines the kinetic and potential energies as two independent GPs. Then, it derives a multi-output kernel of the torques by exploiting EL equations. In this way, the inverse dynamics problem is well defined also in the underactuated setup.
Some embodiments model and as independent GPs, namely ˜(0, (⋅,⋅)) and ˜(0, (⋅,⋅)), where and are the kernels functions that defines the covariance of and . For instance, let {tilde over (x)} and {tilde over (x)}′ be two input locations, then the covariance between the values of at {tilde over (x)} and {tilde over (x)}′ is E[({tilde over (x)}), ({tilde over (x)}′)]=({tilde over (x)}, {tilde over (x)}′). For convenience, , that is the matrix that collects evaluated at X={{tilde over (x)}1, . . . , {tilde over (x)}N}, X′={x′1, . . . , x′M} is given as:
Similarly,
The LIP estimator defines and relying on a polynomial formulation as described previously with reference to
It may be noted that, (i) since and are defined as zero-mean GPS, for the properties of GPs also the Lagrangian function =− is a zero-mean GP with kernel (⋅,⋅)=(⋅,⋅)+(⋅,⋅), namely ˜(0, (⋅,⋅)). Furthermore, (ii) under rigid body assumptions, each τi({tilde over (x)}) is described by a linear differential equation of , namely,
Expanding explicit derivations with respect to time, gives:
Based on (i), (ii), and (iii), some embodiments realize that torques are a zero-mean GP, with covariance defined by a multi-output kernel kτ({tilde over (x)}, {tilde over (x)}′)∈ that encodes the EL equations, and is expressed as
To derive (14) the multi-output version of property (iii) may be applied. This is also described previously with reference to
Once kτ is defined, torque estimates may be computed following standard GPR. Let X be a set of N training input locations, and y=[y1T, . . . , yNT]T the respective torque measurements, with yi∈ equal to the torque measures at input {tilde over (x)}i. The LIP torque estimate in a general input location {tilde over (x)} is
Next the estimation of the kinetic and potential energies, the inertial, Coriolis, and gravity vector as well as δg/δq required by the control laws presented with reference to eq. 9 is described.
The kinetic energy and potential energy are required to implement the energy based control law described by eq. (9). The LIP model provides a principled way to estimate them from the torque measurements y. Indeed, within the LIP framework, , , and τ are jointly Gaussian distributed, since the prior of τ is derived by applying the linear operator to the kinetic and potential GPs and . The covariances between and τ and between and τ at general input locations {tilde over (x)} and {tilde over (x)}′ are
Recalling that and are modelled as independent GPs, and in view of the properties of GPs under linear operators,
The Gaussian properties make the posterior distributions of and given y known analytically. At any general input location x, these posteriors are Gaussians distributions, therefore exactly defined by mean and variance. The means are computed as
From the posterior distributions of and given y an estimate of the energies at arbitrary input location {tilde over (x)} may be obtained as
Referring to
Referring to
The energy-based control law in eq. (5) requires estimating m, c and g, while the LQR described previously requires the inverse of the inertia matrix M as well as the term δg/δq. These quantities are derived similarly to how the energies are computed as described in the previous section.
First, the inertia matrix is estimated component wise. The element in position ij of M is
Accordingly, Mij may be estimated at any input location {tilde over (x)} as
with
Then, from the estimate of the inertia matrix an estimate of the inertial torque component may be derived as {circumflex over (m)}({tilde over (x)})={circumflex over (M)}({tilde over (x)}){umlaut over (q)}.
Next, the gravity contribution g may be estimated. Recall that the i-th component of the vector g is defined as
The covariance between gi and τ at general input locations {tilde over (x)} and {tilde over (x)}′ is
Accordingly gi can be estimated at any input location {tilde over (x)} as
with
Given the estimates of m and g, it is possible to obtain an estimate of the Coriolis and centripetal contribution c as ĉ({tilde over (x)})={circumflex over (τ)}({tilde over (x)})−m({tilde over (x)})−g({tilde over (x)}).
Finally, the matrix
required in the computation of matrix A in eq. (12), is estimated following the same procedure adopted for the inertia matrix. Its element in position ij is given by.
Then, the covariance between Gij({tilde over (x)}) and the torque vector τ at general input locations {tilde over (x)} and {tilde over (x)}′ is
Finally, the estimate at a general input location {tilde over (x)} is computed as
with
Some embodiments are based on the realization that the estimated kinetic energy and the potential energy can be used to detect anomaly of the underactuated mechanical system 602 during operation of the mechanical system 602 (e.g., while performing the task). The anomaly detection based on the estimated kinetic energy and the potential energy of the system 602 is explained next.
Further, the method 700 comprises comparing 703 the estimated kinetic energy and the estimated potential energy with a respective threshold. Towards this end, the estimated kinetic energy may be compared with a first threshold and the estimated potential energy may be compared with a second threshold. Based on such comparison, the anomaly is detected 705. For instance, at block 703, it may be checked if the estimated kinetic energy and the estimated potential energy are greater than the respective threshold. If the estimated kinetic energy and the estimated potential energy are greater than the first threshold and the second threshold, respectively, then, at block 705, it is inferred that the anomaly is detected. If the estimated kinetic energy and the estimated potential energy are not greater than the first threshold and the second threshold, respectively, then, at block 707, it is inferred that no anomaly is detected.
Alternatively, in some embodiments, it may be checked if the estimated kinetic energy and the estimated potential energy are less than the first threshold and the second threshold, respectively. If the estimated kinetic energy and the estimated potential energy are less than the first threshold and the second threshold, respectively, then it is inferred that the anomaly is detected. If the estimated kinetic energy and the estimated potential energy are not less than the first threshold and the second threshold, respectively, then it is inferred that no anomaly is detected. If the estimated kinetic energy is greater than the first threshold and the estimated potential energy is less than the second threshold, then a fault detection is inferred as a special case of anomaly detection depending only on potential energy faults. If the estimated potential energy is greater than the second threshold and the estimated kinetic energy is less than the first threshold, then a fault detection is inferred as a special case of anomaly detection depending only on kinetic energy faults.
Additionally, in some embodiments, the estimated potential energy and the estimated kinetic energy may be used to adjust the control commands for the underactuated mechanical system 602. For instance, based on the estimated potential energy and the estimated kinetic energy, a passivity controller may be designed to control the underactuated mechanical system 602. The passivity controllers are effective to control mechanical systems but require precise models of the energies to define Hamiltonian or Lagrangian of the system. Some embodiments can estimate accurate models of the energies that result in accurate passivity controllers.
Additionally, in an embodiment, a processor may be configured to determine, based on the estimated kinetic energy and the estimated potential energy, a motion plan that consumes a minimum amount of energy for performing the task. The motion plan may include a trajectory for performing the task. Such an embodiment is described below in
The memory 905 can store instructions that are executable by the computer device 900 and any data that can be utilized by the methods and systems of the present disclosure. The memory 905 can include random access memory (RAM), read only memory (ROM), flash memory, or any other suitable memory systems. The memory 905 can be a volatile memory unit or units, and/or a non-volatile memory unit or units. The memory 905 may also be another form of computer-readable medium, such as a magnetic or optical disk.
The storage device 907 can be adapted to store supplementary data and/or software modules used by the computer device 900. The storage device 907 can include a hard drive, an optical drive, a thumb-drive, an array of drives, or any combinations thereof. Further, the storage device 907 can contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid-state memory device, or an array of devices, including devices in a storage area network or other configurations. Instructions can be stored in an information carrier. The instructions, when executed by one or more processing devices (for example, the processor 903), perform one or more methods, such as those described above.
The computing device 900 can be linked through the bus 909, optionally, to a display interface or user Interface (HMI) 947 adapted to connect the computing device 900 to a display device 949 and a keyboard 951, wherein the display device 949 can include a computer monitor, camera, television, projector, or mobile device, among others. In some implementations, the computer device 900 may include a printer interface to connect to a printing device, wherein the printing device can include a liquid inkjet printer, solid ink printer, large-scale commercial printer, thermal printer, UV printer, or dye-sublimation printer, among others.
The high-speed interface 911 manages bandwidth-intensive operations for the computing device 900, while the low-speed interface 913 manages lower bandwidth-intensive operations. Such an allocation of functions is only an example. In some implementations, the high-speed interface 911 can be coupled to the memory 905, the user interface (HMI) 947, and to the keyboard 951 and the display 949 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 915, which may accept various expansion cards via the bus 909. In an implementation, the low-speed interface 913 is coupled to the storage device 907 and the low-speed expansion ports 917, via the bus 909. The low-speed expansion ports 917, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to the one or more input/output devices 941. The computing device 900 may be connected to a server 953 and a rack server 955. The computing device 900 may be implemented in several different forms. For example, the computing device 900 may be implemented as part of the rack server 955.
In accordance with several embodiments, the controller 101 is configured to control the mechanical system 109 using the inverse dynamics model 107. The inverse dynamics model 107 models the energy of the mechanical system 109. Modeling the energy captures mutual effects of the torques of the different actuators on each other. Thereby, the inverse dynamics model 107 enables accurate controlling of the mechanical system 101. Additionally, the formulation of the inverse dynamics model 107 requires minimum physical information about the mechanical system 109. To that end, the formulation of the inverse dynamics model 107 is computationally inexpensive. Additionally, or alternatively, the inverse dynamics model 107 can be used to estimate the kinetic energy of the mechanical system 109 and the potential energy of the mechanical system 109.
The above description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing one or more exemplary embodiments. Contemplated are various changes that may be made in the function and arrangement of elements without departing from the spirit and scope of the subject matter disclosed as set forth in the appended claims.
Specific details are given in the following description to provide a thorough understanding of the embodiments. However, understood by one of ordinary skill in the art can be that the embodiments may be practiced without these specific details. For example, systems, processes, and other elements in the subject matter disclosed may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known processes, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments. Further, like reference numbers and designations in the various drawings indicated like elements. Also, individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process may be terminated when its operations are completed but may have additional steps not discussed or included in a figure. Furthermore, not all operations in any particularly described process may occur in all embodiments. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, the function's termination can correspond to a return of the function to the calling function or the main function.
Furthermore, embodiments of the subject matter disclosed may be implemented, at least in part, either manually or automatically. Manual or automatic implementations may be executed, or at least assisted, through the use of machines, hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine-readable medium. A processor(s) may perform the necessary tasks. Various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
Embodiments of the present disclosure may be embodied as a method, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts concurrently, even though shown as sequential acts in illustrative embodiments. Further, use of ordinal terms such as “first,” “second,” in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements. Although the present disclosure has been described with reference to certain preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the present disclosure. Therefore, it is the aspect of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the present disclosure.
This application is a continuation-in-part of U.S. patent application Ser. No. 18/222,540 filed Jul. 17, 2023, which claims benefit of priority from provisional application Ser. No. 63/469,002 filed May 25, 2023, the contents of which are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63469002 | May 2023 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 18222540 | Jul 2023 | US |
Child | 18772131 | US |