The present invention generally relates to controller systems and more specifically relates to controller systems that utilize deep neural networks.
Control systems can be used to manage or otherwise control a device (or plant), such as unmanned aerial vehicles (UAV), robotic systems, autonomous cars, etc. Control systems can adjust control inputs based on desired objectives. Feedback control systems can adjust control inputs based on output (or feedback) from the controlled device, in addition to the desired objectives. However, when a device is affected by random or complex environmental variables (e.g., weather), it can be difficult to model such variables, leading to poor performance in dynamic conditions.
Systems and methods for learning based control in accordance with embodiments of the invention are illustrated. One embodiment includes a method for training an adaptive controller. The method includes steps for receiving a set of training data that includes several training samples, wherein each training sample includes a state and a true uncertain effect value. The method includes steps for computing an uncertain effect value based on the state, computing a set of one or more losses based on the true uncertain effect value and the computed uncertain effect value, and updating the adaptive controller based on the computed set of losses.
In a further embodiment, the true uncertain effect value is a disturbance force caused by at least one of the group consisting of ground effects and wind conditions.
In still another embodiment, the adaptive controller includes a set of one or more deep neural network (DNNs).
In a still further embodiment, updating the adaptive controller includes backpropagating the computed set of losses through the set of DNNs.
In yet another embodiment, updating the adaptive controller includes updating at least one layer of the set of DNNs using spectrally normalized weight matrices.
In a yet further embodiment, updating the adaptive controller includes updating each layer of the set of DNNs using spectrally normalized weight matrices to constrain the Lipschitz constant of the set of DNNs.
In another additional embodiment, computing the set of losses includes computing at least one of a group consisting of a position tracking error and a prediction error.
In a further additional embodiment, the desired state includes at least one of the group consisting of an attitude, a global position, and a velocity.
In another embodiment again, computing an uncertain effect value of an environment includes determining a set of kernel functions that approximate the uncertain effect value.
In a further embodiment again, the adaptive controller includes a set of one or more deep neural networks (DNNs) with Rectified Linear Units (ReLU) activation. Computing an uncertain effect value of an environment includes determining a set of kernel functions that approximate the uncertain effect value utilizing the set of DNNs. and updating the adaptive controller includes updating each layer of the set of DNNs using spectrally normalized weight matrices to constrain the Lipschitz constant of the set of DNNs.
One embodiment includes a method for online adaptation of an adaptive controller. The method includes steps for receiving a set of inputs dial includes a desired state for a quad rotor, predicting uncertain effects using a model includes several layers, generating control inputs based on the predicted uncertain effects, receiving updated state, computing a set of one or more losses based on the updated state and the desired state, and updating a subset of the several layers of the model based on the computed loss.
In still yet another embodiment, the uncertain effects includes a disturbance force caused by at least one of the group consisting of ground effects and wind conditions.
In a still yet further embodiment, the model includes a set of one or more deep neural networks (DNNs), wherein updating the model includes backpropagating the computed set of losses through the set of DNNs.
In still another additional embodiment, updating the model includes updating weights for only one layer of the set of DNNs.
In a still further additional embodiment, updating weights for the only one laser includes using spectrally normalized weight matrices.
In still another embodiment again, updating the model includes updating each layer of the set of DNNs using spectrally normalized weight matrices to constrain the Lipschitz constant of the set of DNNs.
In a still further embodiment again, computing the set of losses includes computing at least one of a group consisting of a position tracking error and a prediction error.
In yet another additional embodiment, the input includes a desired state for the quadrotor and a current state for the quadrotor, wherein the state for the quadrotor includes at least one of the group consisting of an attitude, a global position, and a velocity.
In a yet further additional embodiment, predicting uncertain effects includes determining a set of kernel functions that approximate the uncertain effects.
In yet another embodiment again, the model includes a set of one or more deep neural networks (DNNs) with Rectified Linear Units (ReLU) activation. The method includes steps for predicting an uncertain effects includes determining a set of kernel functions that approximate the uncertain effect value utilizing the set of DNNs, and updating the adaptive controller includes updating each layer of the set of DNNs using spectrally normalized weight matrices to constrain the Lipschitz constant of the set of DNNs.
Additional embodiments and features are set forth in part in the description that follows, and in part will become apparent to those skilled in the art upon examination of the specification or may be learned by the practice of the invention. A further understanding of the nature and advantages of the present invention may be realized by reference to the remaining portions of the specification and the drawings, which forms a part of this disclosure.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The description and claims will be more fully understood with reference to the following figures and data graphs, which are presented as exemplary embodiments of the invention and should not be construed as a complete recitation of the scope of the invention.
For a given dynamical system, complexity and uncertainty can arise either from its inherent property or the changing environment. Thus model accuracy is often key in designing a high-performance and robust control system. If the model structure is known, conventional system identification techniques can be used to resolve the parameters of the model. When the system becomes too complex to model analytically, modern machine learning research conscripts data-driven and neural network approaches that often result in bleeding-edge performance, given enough samples, proper tuning, and adequate time for training. However, a learning-based control system can simultaneously call for both representation power and fast execution. Adaptive control is a control method that can be used by controllers to adapt a controlled system with uncertain parameters. Successes in adaptive control have been seen using simple linear-in-parameter models with provably robust control designs. On the other hand, the field of machine learning has made its own progress toward a last online paradigm, with rising interest in few-shot learning, continual learning, and meta learning.
An example of a control system in accordance with an embodiment of the invention is illustrated in
Controller 110 receives inputs as well as feedback to generate control inputs for controlling plant 120. Inputs in accordance with some embodiments of the invention can describe a desired state and/or trajectory for the plant. In many embodiments, feedback can include various data on the actual slate of the plant, such as (but not limited to) global position, velocity, acceleration, elevation, and/or altitude. Feedback in accordance with many embodiments of the invention can be measured through sensor readings (e.g., accelerometer, gyroscope, global positioning system (GPS), imaging systems, etc.) of the plant and/or other external sensors.
Controllers in accordance with numerous embodiments of the invention can utilize the received input and/or feedback to generate control inputs for controlling plants, which can then generate additional feedback that can be used by a controller to generate additional control inputs for the plant. In order to generate control inputs, controllers in accordance with a variety of embodiments of the invention can approximate the environment (or system) within which a plant operates, allowing the controller to predict conditions and adjust the control inputs for a plant. System approximation in accordance with numerous embodiments of the invention can be performed using a deep neural network (DNN) trained to approximate (or predict) uncertain conditions in the control system. DNNs for approximating uncertain conditions in accordance with certain embodiments of the invention use Rectified Linear Unit (ReLU) activation, which can convene faster during training and often demonstrate more robust behavior with respect to changes in hyperparameters. Alternatively, or conjunctively, DNNs in accordance with numerous embodiments of the invention can use other activation functions such as, but not limited to, sigmoid, tanh, etc.
A particular interesting scenario for a control system in a changing environment is a multi-rotor flying in varying wind conditions. Classic multi-rotor control docs not consider the aerodynamic forces such as drag or ground effects. The thruster direction is controlled to follow the desired acceleration along a trajectory. To account for aerodynamic forces in practice, an integral term is often added to the velocity controller. Other works use incremental nonlinear dynamic inversion (INDI) to estimate external force through filtered accelerometer measurements, and then apply direct force cancellation in the controller. Some works have assumed a diagonal rotor drag model and proved differential flatness of the system for cancellation, while others used a nonlinear aerodynamic model for force prediction. When a linear-in-parameter (LIP) model is available, adaptive control theories can be applied for controller synthesis. This does not limit the model to only physics-based parameterizations, and a neural network basis can be used. Such models have been applied to multi-rotor for wind disturbance rejection.
As another example, Unmanned Aerial Vehicles (UAVs) often require high precision control of aircraft positions, especially during landing and take-off. This problem is challenging largely due to complex interactions of rotor and wing airflows with the ground. The aerospace community has long identified such ground effect that can cause an increased lift force and a reduced aerodynamic drag. These effects can be both helpful and disruptive in flight stability, and the complications are exacerbated with multiple rotors. Therefore, performing automatic landing of UAVs is risk-prone, and can often require expensive high-precision sensors as well as carefully designed controllers. Compensating for ground effect is a long-standing problem in the aerial robotics community. Prior work has largely focused on mathematical modeling as part of system identification (ID). These models are later used to approximate aerodynamics forces during flights close to the ground and combined with controller design for feed-forward cancellation. However, many existing theoretical ground effect models are derived based on steady-flow conditions, whereas most practical cases exhibit unsteady flow. Alternative approaches, such as integral or adaptive control methods, often suffer from slow response and delayed feedback. Some methods employs Bayesian Optimization for open-air control but not for take-off/landing. Given these limitations, the precision of existing fully automated systems for UAVs are still insufficient for landing and take-off, thereby necessitating the guidance of a human UAV operator during those phases.
When adapting to complex system dynamics or a fast changing environment, a system model needs a high degree of representation power to represent the system accurately, which makes a deep neural network (DNN) an desirable candidate. However, there can be several issues associated with using a deep network for adaptive control purposes, For example, training a DNN often requires hack propagation, easily leading to a computation bottleneck for realtime control on small drones. It can also be challenging to collect sufficient real-world training data, as DNNs are notoriously data-hungry. In addition, continual online training may incur catastrophic inference where previously learned knowledge is forgotten unintentionally. Due to high-dimensionality, DNNs can be unstable and generate unpredictable output, which makes the system susceptible to instability in the feedback control loop, as a vanilla network for a regression problem often does not have guarantees on desirable properties for control design, such as output boundedness and Lipschitz continuity. Further DNNs are often difficult to analyze, which makes it difficult to design provably stable DNN-based controllers without additionally requiring a potentially expensive discretization step and relying on the native Lipschitz constant of the DNN.
Systems and methods in accordance with several embodiments of the invention can provide online composite adaptive control based on deep neural networks (DNNs). Controllers in accordance with numerous embodiments of the invention can model an environment to generate control inputs that account for effects of the environment (e.g., aerodynamic interactions) on the control of the plant. While some aspects of an environment can be mathematically modeled, complex environments can often include various aspects that are difficult to model with traditional methods. To capture complex aerodynamic interactions without overly-constrained conventional modeling assumptions, processes in accordance with several embodiments of the invention can utilize a machine (NIL) approach to build a black-box dynamics model using DNNs.
In some embodiments, the unknown part of a dynamics model can be approximated with a DNN trained offline with previously collected data. Processes in accordance with a variety of embodiments of the invention can utilize small, randomly sampled subsets of the training examples with different uncertain effects (e.g., wind conditions, ground effects, etc.) to learn kernel functions that can approximate various uncertain effects. In a variety of embodiments, adaptive controllers can be trailed with constrained weight matrices generate stable and predictable outputs that allow for smooth and accurate control of the plant.
A process for training a stable adaptive controller in accordance with an embodiment of the invention is conceptually illustrated in
Process 200 computes (210) uncertain effects. Uncertain effects in accordance with a variety if embodiments of the invention can include (but are not limited to) ground effects, wind conditions, and other disturbance forces. In a variety of embodiments, processes can compute (or predict) uncertain effects using a set of one or more DNNs initialized with random weights, which are updated by the training process. In a number of embodiments, the process generates a set of parameters, such that a linear combination of neural net kernels can represent certain effects with small error.
Process 200 computes (215) a set of one or more losses based on the measured effects. Losses in accordance with a number of embodiments of the invention can include position tracking errors, prediction error, etc. In numerous embodiments, a composite loss composed of multiple different errors can be computed.
Process 200 updates (220) the model based on the computed set of losses. In numerous embodiments, backpropagation can be used to update weights of a deep neural network. Processes in accordance with a variety of embodiments of the invention can update weights of one or more layers of the set of DNNs using spectrally normalized weight matrices. As described below, using spectrally normalized weight matrices for every layer of the DNN can constrain the Lipschitz constant of the set of DNNs, allowing for stable and predictable outputs from the controller.
In several embodiments, DNNs for the uncertain effects can be trained using meta-learning processes that allow deeper layers of the network to capture internal features that are more generally applicable to predicting a variety of different uncertain effects. In various embodiments, training processes can employ model-agnostic meta-learning (MAML) techniques can be used to train adaptive controller models. MAML techniques can facilitate hidden layer outputs becoming good basis functions for online adaptation. In many embodiments, processes can split parameters of neural networks into internal and output layer parameters, with a focus on training the internal layer parameters such that the output layer parameters can be changed to represent a large class of uncertain conditions. In a variety of embodiments, processes can use meta-training data Dmeta={D1, D2, . . . , DT}, with T sub datasets, where each sub dataset Di includes Li state and force measurement pairs, ([qk, {dot over (q)}k], {circumflex over (f)}(qk, {dot over (q)}k,ci)), generated from some fixed but unknown system condition (wind speed, ground effects, etc.), represented by ci. The goal of meta-learning is to generate a set of parameters, Θ={θi}i=1m, such that a linear combination of the neural net kernels, {φi}i=1m, can represent any wind condition with small error. Meta learning is described in greater detail below.
Process 200 determines (225) whether there is more training data. When the process determines that there is more training data, the process returns to step 205 to receive more training data. In several embodiments, each set of training data is a subset of a larger training corpus, where each set includes training data captured in a different environmental condition wind speeds). Otherwise, the process ends.
Systems and methods in accordance with numerous embodiments of the invention provide a learning-based controller that can improve the precision of quadrotor landing with guaranteed stability. Controllers in accordance with many embodiments of the invention can directly learn the ground effect on coupled unsteady aerodynamics and vehicular dynamics. In several embodiments, deep learning can be used for system ID of residual dynamics and then integrated with nonlinear feedback linearization control. Although many of the examples described herein describe applications to quadrotors, one skilled in the art will recognize that similar systems and methods can be used in a variety of multi-copter systems, including (but not limited to) hexacopters and octacopters, without departing from this invention.
Adaptive controllers in accordance with same embodiments of the invention can be used to control a quadrotor during take-off. landing and cross-table maneuvers. Adaptive controllers in accordance with many embodiments of the invention have been shown to be able to land a quadrotor much more accurately than a Baseline Nonlinear Tracking Controller with a pre-identified system, decreasing error in the z axis and mitigating x and y drifts by as much as 90%, in the landing case. In several embodiments, the learned model can handle temporal dependency, and is an improvement over the steady-state theoretical models.
Processes in accordance with some embodiments of the invention can train a model offline (e.g., using a process similar to that described with reference to
Process 300 predicts (310) uncertain effects such as (but not limited to) wind conditions, ground effects, and/or other disturbance forces using a model. In a variety of embodiments, the model includes a set of one or more DNNs that are trained offline to predict disturbance forces based on training data. DNNs in accordance with some embodiments of the invention can be trained with constrained weight matrices (e.g., spectrally normalized weight matrices) that can be used control the outputs generated by the set of DNNs. In a number of embodiments, models can generate a set of parameters, such that a linear combination of neural net kernels can represent uncertain effects with small error.
Process 300 generates (315) control inputs based on the predicted uncertain effects. Control inputs in accordance with several embodiments of the invention can include various control signals for controlling a plant. In certain embodiments, control inputs can include (but are not limited to) thrust commands, attitude commands, and/or squared rotor speeds.
Process 300 receives (320) updated stale data for the plant. Updated state in accordance with a number of embodiments of the invention can be received and/or computed from data received from sensors of a plant and/or from sensors external to the plant. In a number of embodiments, sensors can include (but not limited to) accelerometers, gyroscopes, altimeters, cameras, global positioning service (GPS), etc.
Process 300 computes (325) a set of one or more losses based on the updated state. Losses in accordance with a number of embodiments of the invention can include position tracking errors prediction error, etc. In numerous embodiments, a composite loss composed of multiple different errors can be computed.
Process 300 uptimes (330) the model based on the computed losses. Processes in accordance with numerous embodiments of the invention can update only the last layer weights in a fashion similar to composite adaptive control, or update the last layer's weights more frequently than the rest of the network. This can enable the possibility of fast adaptation without incurring high computation burden. In several embodiments, the online training process is performed continuously with changing conditions. Alternatively, or conjunctively, the online training process can be performed periodically.
While specific processes for training an adaptive controller are described above, any of a variety of processes can be utilized as appropriate to the requirements of specific applications. In certain embodiments, steps may be executed or performed in any order or sequence not limited to the order and sequence shown and described. In a number of embodiments, some of the above steps may be executed or performed substantially simultaneously where appropriate or in parallel to reduce latency and processing times. In some embodiments, one or more of the above steps may be omitted. In numerous embodiments, processes can include steps for offline training in combination with steps for online training.
In several embodiments, DNNs can be trained with layer-wise spectrally normalized weight matrices. Network weights in accordance with several embodiments of the invention can be spectrally normalized during training and/or online adaptation, to constrain the Lipschitz constant of a system approximator, which can be a necessary condition for stable control design. The resulting controller can be shown to be globally exponentially stable under bounded learning errors by exploiting the Lipschitz bound of spectrally normalized DNNs.
Mixed Model for Robot Dynamics
Consider the general robot dynamics model:
H(q){umlaut over (q)}+C(q,{dot over (q)}){dot over (q)}+g(q)+f(q,{dot over (q)};c)=τ (1)
where q,{dot over (q)},{umlaut over (q)}∈n are the n dimensional position, velocity, and acceleration vectors, H(q) is the symmetric, positive definite inertia matrix, C(q,{dot over (q)}){dot over (q)} is the centripetal and Coriolis torque vector, g(q) is the gravitational torque vector, f(q,{dot over (q)};c) incorporates unmodeled dynamics, and c=c(t) is the hidden state used to represent changing environment.
In several embodiments, systems approximators can approximate unmodeled (or uncertain) dynamics terms with a linear combination of a set of m neural network kernels. Two formulations are considered here. First, f(q,{dot over (q)};c) can be approximated by linearly combining m outputs from m separately trained neural networks {right arrow over (φ)}i: n×n→n parameterized by θi:
where a(c)=[ai(c)] ∈n and the kernels are stacked such that φ(q,{dot over (q)}; Θ)=[{right arrow over (φ)}(q,{dot over (q)},θi)i] and Θ=[θi]. In some embodiments, a(c) can be a set of parameters that implicitly encodes the dependence of fa on the or mental conditions, c. Recall that fa is assumed to be a function of the state, q and {dot over (q)}, and environmental conditions, c. It can be, further assumed that the dependence can be linearly separated as fa(q,{dot over (q)};c)=a(c)ϕ(q,{dot over (q)}). In a number of embodiments, for fixed kernel or basis functions, ϕ, and a given c, a(c) is taken as the best least squares estimator of so fa(c) is implicitly a function of the data for fa, which is itself a function of the conditions c. The goal of meta. learning can be to learn the best set of basis functions ϕ under this separability assumption, and the adaptive control formulation in accordance with certain embodiments of the invention can adapt a to the current environment conditions online.
Second, consider the alternative formulation where f(q,{dot over (q)};c) is approximated with a single neural network, where a represents the weights of its last layer, and {φi} represent the hidden states before the last layer. This can be explicitly written as
where êj represent the standard basis vectors.
In both cases, maximum representation error, ϵ, is
where Ξ is the compact domain of interest. Note, the boundedness of ϵ is apparent under the assumption of bounded Lipschitz constant of f(q,{dot over (q)},c) and bounded training error. Given Θ, the goal then can be to design a control law, τ(q,{dot over (q)},qd,{dot over (q)}d), that drives (q,{dot over (q)})→(qd,{dot over (q)}d), subject to dynamics in (1).
Quadrotor Position Control Subject to Uncertain Conditions
Given quadrotor states as global position p∈3, velocity v∈3, attitude rotation matrix R∈SO(3), and body angular velocity ω∈3, consider the following dynamics:
{dot over (p)}=v, m{dot over (v)}=mg+Rf
u
+f
a, (5a)
{dot over (R)}=RS(ω), J{dot over (ω)}=Jω×ω+τu+τa, (5b)
where m and J are mass and inertia matrix of the system respectively, S(·) is skew-symmetric mapping, g=[0,0,−g]T is the gravity vector, fu=[0,0,T]T and τu=[τx, τy, τz]T are the total thrust and body torques from four rotors predicted by a nominal model. η=[T, τx, τy, τz]T denotes the output wrench. Typical quadrotor control input uses squared motor speeds u=[n12, n22, n32, n42]T, and is linearly related to the output wrench η=B0u, with
where CT and CQ are rotor force and torque coefficients, and tarm denotes the length of rotor arm. A key difficulty of precise landing can be the influence of unknown disturbance forces fa=[fa,x, fa,y, fa,z]T and torques τa=[τa,x, τa,y, τa,z]T, which originate from complex aerodynamic interactions between the quadrotor and the environment, such as those due to ground effects and/or varying wind conditions.
Systems and methods in accordance with certain embodiments of the invention can improve controller accuracy by learning the unknown disturbance forces fa and/or torques τa. In a number of situations (such as, but not limited to, landing, take-off, and flying in strong wind conditions), the attitude dynamics is limited and the aerodynamic disturbance torque τa can be bounded. Position dynamics eq. (5a) and fa can be the primary concern. Considering only position dynamics, eq. (5a) can be cast into the form of (1), by taking H(q)=mI, where I is the identity matrix, C(q,{dot over (q)})≡0, g=mg, f(q,{dot over (q)};c)=fa and τ=Rfu. Note that the quadrotor attitude dynamics is just a special case of (1). In various embodiments, fa can be approximated using a DNN with spectral nor normalization to guarantee its Lipschitz constant, avid then the DNN can be incorporated in the exponentially-stabilizing controller. In numerous embodiments, training can be done off-line and the learned dynamics can be applied in the on-board controller in real-time to achieve smooth flight paths in varied conditions.
Meta-Learning and Adaptation Goal
Suppose there is a pre-collected meta-training data Dmeta={D1, D2, . . . , DT}, with T sub datasets. In each sub dataset, Di, there are Li state and force measurement pairs, ([qk,{dot over (q)}k], {circumflex over (f)}(qk,{dot over (q)}k,ci)), generated from some fixed but unknown wind condition, represented by ci. The goal of meta-learning is to generate a set of parameters, Θ={θi}i=1m, such that a linear combination of the neural net kernels, {φi}i=1m, can represent any wind condition with small error.
Consequently, the adaptive controller aims to stabilize the system to a desired trajectory given the prior information of the dynamic model (1) and the learned kernels, (2) or (3). If exponential convergence is guaranteed, then the system is robust to aerodynamic effects not represented by the prior from meta learning, which is encapsulated in ϵ.
In a variety of embodiments, rather than training on multiple sub datasets for different wind conditions, training can be performed on a single data set, D, with variable wind conditions, c. The goal would be to learn a set of kernel functions that represents the state and small time scale dependence of the aerodynamic forces on the wind conditions. Then, the last layer parameters, a, can be slowly adapted as the wind conditions evolve. The key difference in training these kernels would be in solving for a(Θ). Now, instead of sampling the data set and solving a(Θ) through least squares, a will be solved using least squares with exponential forgetting, similar to the adaptive control, such that a(t, Θ).
Learning Residual Dynamics
In various embodiments, unknown disturbance forces fa can be learned using a DNN with Rectified Linear Units (ReLU) activation. In general, DNNs equipped with ReLU converge faster during training, demonstrate more robust behavior with respect to changes in hyperparameters, and have fewer vanishing gradient problems compared to other activation functions such as sigmoid.
ReLU Deep Neural Networks
A ReLU deep neural network represents the functional mapping from the input x to the output f(x,θ), parameterized by the DNN weights θ=W1, . . . , WL+1:
f(x, θ)=WL+1ϕ(WL(ϕ(WL−1( . . . ϕ(W1x) . . . )))), (7)
where the activation function ϕ(·)=max(·,0) is called the element-wise ReLU function. ReLU is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations. However, deep neural networks are usually trained by first-order gradient based optimization, which is highly sensitive on the curvature of the training objective and can be unstable. To alleviate this issue, processes in accordance with some embodiments of the invention can apply the spectral normalization technique.
Spectral Normalization
Spectral normalization can stabilize DNN training by constraining the Lipschitz constant of the objective function. Spectrally normalized DNNs have also been shown to generalize well, which is an indication of stability in machine learning. Mathematically, the Lipschitz constant of a function ∥f∥Lip is defined as the smallest value such that
∀x,x′: ∥f(x)−f(x′)∥2/∥x−x′∥2≤∥f∥Lip.
It is known that the Lipschitz constant of a general differentiable function f is the maximum spectral norm (maximum singular value) of its gradient over its domain ∥f∥Lip=supx σ(∇f(x)).
The ReLU DNN in eq. (7) is a composition of functions (or layers). In several embodiments, the Lipschitz constant of a network can be bounded by constraining the spectral norm of each layer g1(x)=ϕ(W1x). Therefore, for a linear map g(x)=Wx, the spectral norm of each layer is given by ∥g∥Lip=supx σ(∇g(x))=supx σ(W)=σ(W). Using the fact that the Lipschitz norm of the ReLU activation function ϕ(·) is equal to 1, with the inequality ∥g1∘g2∥Lip≤∥g1∥Lip·∥g2∥Lip, the following bound can be found on ∥f∥Lip:
In certain embodiments, spectral normalization can be applied to the weight matrices in each layer during training as follows:
where γ is the intended Lipschitz constant for the DNN. The following lemma can bound the Lipschitz constant of a ReLU DNN with spectral normalization.
Lemma 1. Fora multi-layer ReLU network f(x, θ), defined in eq. (7) without an activation function on the output layer. Using spectral normalization, the Lipschitz constant of the entire network satisfies:
∥f(x,
with spectrally-normalized parameters
Proof. As in eq. (8), the Lipschitz constant can be written as a composition of spectral norms over all layers. The proof follows from the spectral norms constrained as in eq. (9).
Constrained Training
Gradient-based optimization in accordance with several embodiments of the invention can be applied to train the ReLU DNN with a bounded Lipschitz constant, In certain embodiments, estimating fa in (5) boils down to optimizing the parameters θ in the ReLU network in eq. (7), given the observed value of x and the target output. In particular, the Lipschitz constant of the ReLU network can be controlled.
The optimization objective in accordance with a variety of embodiments of the invention is as follows, where the prediction error can be minimized with a constrained Lipschitz constant:
where yt is the observed disturbance forces and Xt is the observed states and control inputs. According to the upper bound in eq. (8), the constraint in accordance with a number of embodiments of the invention can be substituted by minimizing the spectral norm of the weights in each layer. In many embodiments, stochastic gradient descent (SGD) can be used to optimize eq. (10) and apply spectral normalization to regulate the weights. From Lemma 1, the trained ReLU DNN has a Lipschitz constant.
Learning Kernel Functions
Recall that the meta learning goal is to learn a set of kernel functions, {φ(q,{dot over (q)})}, such that for any wind condition, c, there exists a suitable a such that φ(q,{dot over (q)},Θ) is a good approximation of f(q,{dot over (q)},c). This problem can be formulated as the minimization problem
where the training data is divided into subsets, Di, each corresponding to a fixed wind condition.
Note that (11) can be equivalently written as
The inner problem, Σimina
Write the least squares solution for a as
a=a
LS(Θ,Dia). (13)
Note that this solution can be explicitly written as the solution to the following equation.
where K is the size of Dia. Therefore the least-square solution will be
a
LS(Θ,Dia)=LS(ΘDia)=(ΦTΦ)−1ΦTF (15)
Now with a as a function of Θ, the outer problem in (12) can be solved using stochastic gradient descent on Θ in accordance with certain embodiments of the invention. An example of a meta-learning procedure is illustrated in
Adaptive controllers for 3-D trajectory tracking in accordance with various embodiments of the invention can be constructed as a nonlinear feedback linearization controller whose stability guarantees are obtained using the spectral normalization of the DNN-based ground-effect model. In some embodiments, the Lipschitz property of a DNN can be exploited to solve for the resulting control input using fixed-point iteration.
Reference Trajectory Tracking
The position tracking error can be defined as {tilde over (p)}=p−pd. Controllers in accordance with numerous embodiments of the invention can use a composite variable s=0 as a manifold on which {tilde over (p)}(t)→0 exponentially:
s={tilde over ({dot over (p)})}+Λ{tilde over (p)}={dot over (p)}−v
r (16)
with Λ as a positive definite or diagonal matrix. Now the trajectory tracking problem can be transformed to tracking a reference velocity vr={dot over (p)}d−Λ{tilde over (p)}.
Define {circumflex over (f)}a (ζ, u) as the DNN approximation to the disturbance aerodynamic forces, with ζ being the partial states used as input features to the network. The total desired rotor force fd can be designed as
f
d=(Rfu)d=
Substituting eq. (17) into eq. (5), the closed-loop dynamics would simply become m{dot over (s)}+Kvs=ϵ, with approximation error ϵ=fa−{circumflex over (f)}a. Hence, {tilde over (p)}(t)→0 globally and exponentially with bounded error, as long as ∥ϵ∥ is bounded.
Consequently, desired total thrust Td and desired force direction {circumflex over (k)}d can be computed as
T
d
=f
d
·{circumflex over (k)}, and {circumflex over (k)}d=fd/∥fd∥, (18)
with {circumflex over (k)} being the unit vector of rotor thrust direction (typically z-axis in quadrotors). Using {circumflex over (k)}d and fixing a desired yaw angle, desired attitude Rd can be deduced. Assume that a nonlinear attitude controller uses the desired torque τd, from rotors to track Rd(t), such as by:
τd=J{dot over (ω)}r−Jω×ωr−Kω(ω−ωr), (19)
where the reference angular rate ωr is designed similar to eq. (16), so that when ω→ωr, exponential trajectory tracking of a desired attitude Rd(t) is guaranteed within some bounded error in the presence of bounded disturbance torques.
Learning-Based Discrete-Time Nonlinear Controller
From eqs. (6), (18) and (19), the desired wrench ηd=[Td, τdT]T can be related with the control signal u through
Because of the dependency of {circumflex over (f)}a on u, the control synthesis problem here is non-affine. Therefore, in many embodiments, the following fixed-point iteration method can be used for solving eq. (20):
u
k
=B
0
1ηd(uk−1), (21)
where uk and uk−1 are the control input for current and previous time-step in the discrete-time controller. Next, the stability of the system and convergence of the control inputs are proved in eq. (21).
Robust Composite Adaptation
Recall the control design objective is to design a control system that leverages the kernels, φ(q,{dot over (q)};Θ), to stabilize the system defined in (1), to some desired trajectory (qd,{dot over (q)}d). Treating Θ as fixed, dependence on Θ will not be notated in this section. The control system will have two parts: a control law, τ(q,{dot over (q)},qd,{dot over (q)}d,â), and an update law, â(q,{dot over (q)},qd,{dot over (q)}d, τ).
In the process of designing the control system, a few key assumptions can be made.
Assumption 1. The desired trajectory and its first and second derivatives, {qd(t), {dot over (q)}d(t), {umlaut over (q)}d(t)}, are bounded.
Assumption 2. The flown flight trajectory, (q(t), {dot over (q)}(t)), and the current wind conditions, c, are a subset of Ξ. Thus, the optimal parameters for the flown flight trajectory and current wind conditions, given by a=aLS(Θ,(q(t),{dot over (q)}(t),f(q(t),{dot over (q)}(t),c)), with pointwise representation error, d(q,{dot over (q)})=∥φ(q,{dot over (q)};Θ)a−f(q,{dot over (q)};c)∥, have maximum representation error along the flown flight trajectory, d, less than the maximum global representation error ϵ. That is,
Note that for time varying optimal parameters, a=a(t), the same formulation can be followed, but with an additional disturbance term proportional to {dot over (a)}.
Nonlinear Control Law
In formulating the control problem, the composite velocity tracking error term, s, and the reference velocity, {dot over (q)}r, can be defined such that s={dot over (q)}−{dot over (q)}r={tilde over ({dot over (q)})}+Λ{tilde over (q)}, where {tilde over (q)}=q−qd is the position tracking error, and Λ is a control gain and positive definite. Then given parameter estimate â, the following control law can be defined
τ=H(q){umlaut over (q)}r+C(q,{dot over (q)}){dot over (q)}r+g(q)+φ(q,{dot over (q)})â−Ks (24)
where K is another positive definite control gain. Combining (1) and (24) leads to the closed-loop dynamics of
H(q){dot over (s)}+(C(q,{dot over (q)})+K)s−φ(q,{dot over (q)})ãd(q,{dot over (q)}) (25)
Composite Adaptation Law
Systems and methods in accordance with certain embodiments of the invention can define an adaptation law that combines a tracking error update term, a prediction error update term, and a regularization term. First, the prediction error can be defined as
e(q,{dot over (q)})φ(q,{dot over (q)})â−f(q,{dot over (q)},c)=φq,{dot over (q)})ã+d(q,{dot over (q)}) (26)
Next, the right hand side of (26) can be filtered with a stable first-order filter with step response w(r) to define the filtered prediction error.
e
1(â,t)W(t)â−y1(t)=W(t)ã+d1(t) (27)
with filtered measurement, y1=y1(t)=∫0t w(t−r)y(r)dr, filtered kernel function, W=W(t)=∫0t w(t'r)φ(r)dr, and filtered disturbance, d1(t)=∫0t w(t−r)d(r)dr.
Now consider the following cost function in accordance with certain embodiments of the invention.
J
2(â)=∫0t e−λ(t−r)∥W(r)â−y1(r)∥2dr+γ∥â∥2 (28)
Note this is is closely related to the cost function defined in (11), with three modifications. First, the inclusion of an exponential forgetting factor will lead to exponential convergence of √{square root over (
Note that J2 is quadratic and convex in â, leading to a simple closed form solution for â. However, this requires evaluating an integral over the entire trajectory at every time step, so differentiating this closed form solution for â gives the following prediction error with regularization regularization update law.
{circumflex over ({dot over (a)})}=−
(λ−λγ
where
(∫0te−λ(t−r)WTW dr+γ)−1 (31)
Processes in accordance with various embodiments of the invention can define the composite adaptation law with regularization, which incorporates an additional tracking error based term proportional to s into (29). As described below, tracking error terms in accordance with many embodiments of the invention exactly cancel the ã term in (25).
{circumflex over ({dot over (a)})}=−
Theorem 2. Under Assumptions 1 and 2 and using the control law defined in (24), the composite tracking error and parameter estimation error evolving according to the dynamics in (25) and adaptation law in (30-32) exponentially converge to the error ball
where k=λminK and κ(·) is the condition number.
Proof. Rearranging the composite tracking error dynamics and the parameter estimate dynamics, defined in (25) and (32), and using the derivative of
Consider the Lyapunov-like function V=yTMy, with
and metric function, M, given by
Using the closed loop dynamics given in (34) and the skew symmetric property of {dot over (M)}−2C, the inequality relationship for the derivative of V is
Applying the transformation W=√{square root over (yT My)} and a comparison lemma results in
thus proving the exponential convergence to the error ball given in (33).
Neural Lander Stability Proof
The closed-loop tracking error analysis provides a direct correlation on how to tune the neural network and controller parameter to improve control performance and robustness.
Control Allocation as Contraction Mapping
The control input uk converges to the solution of eq. (20) when all states are fixed, as shown below
Lemma 3. Define mapping uk=(uk−1) based on eq. (21) and fix all current states:
If {circumflex over (f)}a(ζ,u) is La-Lipschitz continuous, and σ(B0−1)·La<1; then (·) is a contraction mapping, and uk converges to unique solution of u*=(u*).
Proof. ∀u1, u2 ∈ with being a compact set of feasible control inputs; and given fixed states as
∥(u1)−(u2)∥2=∥B0−1({circumflex over (f)}a(ζ,u1)−{circumflex over (f)}a(ζ,u2))∥2 ≤σ(B0−1)·La∥u1−u2∥2.
Thus, ∃a<1,s.t∥(u1)−(u2)∥2<a∥u1−u2∥2. Hence, (·) is a contraction mapping.
The fixed iteration is a zero'th order method, since it does not rely on any derivatives to solve the equation. In several embodiments, higher order methods can be used (with some computational trade offs) to solve non-affine control problems, which may not meet the convergence criteria for the fixed point iteration.
Stability of Learning-Based Nonlinear Controller
Before continuing to prove the stability of the full system, make the following assumptions.
Assumption 3. the desired states along the position trajectory pd(t), {dot over (p)}d(t), and {umlaut over (p)}d(t) are bounded.
Assumption 4. One-step difference of control signal satisfies ∥uk−uk−1∥≤ρ∥s∥ with a small positive ρ.
For the intuition behind this assumption, from eq. (42), the following approximate relation can be derived with Δ(·)k=∥(·(k−(λ)k−1∥:
Δuk≤σ(B0−1)(LaΔuk−1+LaΔζk +mΔ{dot over (v)}r,k+λmax(Kv)Δsk+Δτd,k).
Because the update rate of attitude controller (>100 Hz) and motor speed control (>5 kHz) are much higher than that of the position controller (≈10 Hz), in practice, Δsk, Δ{dot over (v)}r,k, and Δζk can be safely neglected in one update. Furthermore, Δτd,k can be limited internally by the attitude controller. It leads to:
Δuk≤σ(B0−1)(LaΔuk−1+c),
with c being a small constant and σ(B0−1)·La<1 from Lemma. 3, it can be deduced that Δu rapidly converges to a small ultimate bound between each position controller update.
Assumption 5. The learning error of {circumflex over (f)}a(ζ, u) over the compact sets ζ∈Z, u∈ is upper bounded by ϵm=∥ϵ(ζ, u)∥, where ϵ(ζ, u)=fa(ζ, u)−{circumflex over (f)}a(ζ, u).
DNNs have been shown to generalize well to the set of unseen events that are from almost the same distribution as the training set. This empirical observation is also theoretically studied in order to shed more light toward an understanding of the complexity of these models. Based on the above assumptions, the overall stability and robustness results are presented below.
Theorem 4. Under Assumptions 1-3, for a time-varying pd(t), the controller defined in eqs. (17) and (21) with λmin(Kv)>Laρ achieves exponential convergence of composite variable s to error ball limt→∞∥s(t)∥=ϵm/(λmin(Kv)−Laρ) with rate ((λmin(Kv)−Laρ)/m. and {tilde over (p)} exponentially converges to error ball
with rate λmin(Λ).
Proof. Select a Lyapunov function as V(s)=½m∥s∥2, then apply the controller eq. (17) to get the time-derivative of V:
{dot over (V)}=s
T(−Kvs+{circumflex over (f)}a(ζk, uk)−{circumflex over (f)}a(ζk, uk−1)+ϵ(ζk, uk)) ≤−sT Kvs+∥s∥(∥{circumflex over (f)}a(ζk, uk)−{circumflex over (f)}a(ζk, uk−1)∥+ϵm)
Let λ=λmin(Kv) denote the minimum eigenvalue of the positive-definite matrix Kv. By applying the Lipschitz property of {circumflex over (f)}a lemma 1 and Assumption 2, obtain
Using the Comparison Lemma, define W(t)=√{square root over (V(t))}=√{square root over (m/2)}∥s∥ and {dot over (W)}={dot over (V)}/(2√{square root over (V)}) to obtain
It can be shown that this leads to finite-gain p stability and input-to-state stability (ISS). Furthermore, the hierarchical combination between s and {tilde over (p)} in eq. (16) results in limt→∞∥{tilde over (p)}(t)∥=limt→∞∥s(t)∥/λmin(Λ), yielding (43).
A learning-based composite-adaptation controller in accordance with many embodiments of the invention was implemented and tested on an INTEL Aero Ready to Fly Drone. It was tested with three different trajectories for each of three different kernel functions. In each test, wind conditions were generated using CAST's open air wind tunnel. The first test had the drone hover in increasing wind speeds. The second test had the drone move quickly between different set points with increasing wind speeds. These time varying wind conditions showed the ability of the controller to adapt to new conditions in real time. The third test had the drone fly in a
The INTEL Aero Drone incorporates a PX4 flight controller with the INTEL Aero Compute Board, which runs Linux on a 2.56 GHz Intel Atom x7 processor with 4 GB RAM. The controller was implemented on the Linux board and sent thrust and attitude commands to the PX4 flight controller using MAVROS software. CAST's Optitrack motion capture system was used for global. position information, which was broadcast to the drone via WWI. An Extended Kalman Filter (EKF) running on the PX4 controller filtered the EVIU and motion capture information, to produce position and velocity estimates. The CAST open air wind tunnel consists of approximately 1,400 distributed fans, each individually controllable, in a 3 by 3 meter grid.
Data Collection and KernelTraining
Position, velocity, acceleration, and motor speed data was gathered by flying the drone on a random walk trajectory for at 0, 1.3, 2.5, 3.7, and 4.9 m/s wind speeds for 2 minutes each, to generate training data. The trajectory was generated by randomly moving to different set points in a predefined cube centered in front of the wind tunnel. Then, using the dynamics equations defined previously, the aerodynamic disturbance force, f, was computed.
Three different kernel functions were used in the tests. The first was an identity kernel, φ≡I. Note that with only a tracking error update term in the adaptation law, this would be equivalent to integral control. The second and third kernels were the vector and scalar kernels, defined in and (3), respectively.
During offline training, the kernels were validated by estimating a using a least squares estimator on a continuous segment of a validation trajectory. Then, the predicted force was compared to the measured force for another part of the validation trajectory.
Hovering in Increasing Wind
In this test, the drone was set to hover at a fixed height centered in the wind tunnel test section. The wind tunnel was set to 2.5 m/s for 15 seconds, then 4.3 m/s for second 10 seconds, then 6.2 m/s for 10 seconds.
In this test, each controller achieves similar parameter convergence. The facts that for each case, as the drone converges to the target hover position, the kernel functions approach a constant value, and each uses the same adaptation law, probably leads to similar convergence properties for each controller. Note however, as seen in
Random Walk with Increasing Wind
The second test had the drone move quickly between random set points in a cube centered in front of the wind tunnel, with a new set point generated every second for 60 seconds. For the first 20 seconds, the wind tunnel was set to 2.5 m/s, for the second 20 seconds, 4.3 m/s, and for the last 20 seconds, 6.2 m/s. The random number generator seed was fixed before each test so that each controller received the exact same set points.
Note that the desired trajectory for this test had sudden changes in desired acceleration and velocity when the set point was moved. Thus, the composite velocity error is significantly higher than in the other tests. The learned kernel methods in both cases outperformed the constant kernel method in prediction error performance, but all three methods had similar tracking error performance, as seen in
Figure 8 with Constant Wind
The third trajectory was a figure 8 pattern oriented up and down (z-a and towards and away from the wind tunnel (x-axis). This test used a fixed wind speed of 4.3 m/s. In each test, the drone was started from hovering near the center of the figure 8. Then, the wind tunnel was turned on and allowed to begin ramping up for 5 seconds, and the figure 8 trajectory was flown repeated for one minute. Each loop around the figure 8 took 8 seconds.
In this test, there is a striking difference between the prediction error performance of the learned kernels versus the constant kernel, as seen in
Results
Three key error metrics are given for each kernel and trajectory in
An integrated approach in accordance with a number of embodiments of the invention uses prior data to develop a drone controller capable of adapting to new and changing wind conditions. A meta-learning formulation to the offline training helped design kernel functions that can represent the dynamics effects observed in the training data. An adaptive controller in accordance with some embodiments of the invention can exponentially stabilize the system.
In experiments, the learned kernels were able to reduce prediction error performance over a constant kernel. However, this did not translate into improved tracking error performance. This could be caused by a combination of attitude tracking error, input saturation, and dependence of unmodeled dynamics on control input. Both input saturation and attitude tracking error were shown to lead to increased position tracking error. Different aerodynamic effects can cause a change in rotor thrust, usually modeled as a change in the coefficient of thrust.
As illustrated by the results, adaptive control (with either constant kernel or learned kernel) in accordance with various embodiments of the invention is able to effectively compensate for the unmodeled aerodynamic effects and adapt to changing conditions in real time.
Control System
In certain embodiments, a control system operates entirely within a single device (e.g., a quadrotor with onboard compute). Control systems in accordance with several embodiments of the invention can distribute functions across multiple devices and services. A control system in accordance with some embodiments of the invention is shown in
Users may use personal devices 1180 and 1120 that connect to the network 1160 to perform processes that train adaptive controllers and/or operate the controllers to control a device quadrotors, drones, etc.) in accordance with various embodiments of the invention. In the shown embodiment, the personal devices 1180 are shown as desktop computers that are connected via a conventional “wired” connection to the network 1160. However, the personal device 1180 may be a desktop computer, a laptop computer, a smart television, an entertainment gaming console, or any other device that connects to the network 1160 via a “wired” connection. The mobile device 1120 connects to network 1160 using a wireless connection. A wireless connection is a connection that uses Radio Frequency (RF) signals, Infrared signals, or any other form of wireless signaling to connect to the network 1160. In
As can readily be appreciated the specific computing system used to train adaptive controllers and/or operate the controllers is largely dependent upon the requirements of a given application and should not be considered as limited to any specific computing system(s) implementation.
Control Element
An example of a control element that executes instructions to perform processes that control a plant in accordance with various embodiments of the invention is shown in
One skilled in the art will recognize that a particular control element may include other components that are omitted for brevity without departing from this invention. The processor 1205 can include (but is not limited to) a processor, microprocessor, controller, or a combination of processors, microprocessor, and/or controllers that performs instructions stored in the, memory 1220 to manipulate data stored in the memory. Processor instructions can configure the processor 1205 to perform processes in accordance with certain embodiments of the invention.
Peripherals 1210 can include any of a variety of components for capturing data, such as (but not limited to) cameras, displays, and/or sensors. In a variety of embodiments, peripherals can be used to gather inputs and/or provide outputs. Network interface 1215 allows control element 1200 to transmit and receive data over a network based upon the instructions performed by processor 1205. Peripherals and/or network interfaces in accordance with many embodiments of the invention can be used to gather inputs that can be used to determine the state of a device attitude, position, velocity, acceleration, etc.). Memory 1220 includes an adaptive controller 1225, sensor data 1230, and model data 1235.
Sensor data in accordance with many embodiments of the invention can include data recorded by sensors of a device, such as (but not limited to) image data, accelerometer data, gyroscope data, altimeter data, GPS data, etc. In several embodiments, model data can store various parameters and/or weights for dynamics models. Model data in accordance with many embodiments of the invention can be updated through offline training based on recorded training data and/or through online training based on operation of a device in an uncertain environment.
Although a specific example of a control element 1200 is illustrated in
Although the present invention has been described in certain specific aspects, many additional modifications and variations would be apparent to those skilled in the art. It is therefore to be understood that the present invention may be practiced otherwise than specifically described. Thus, embodiments of the present invention should be considered in all respects as illustrative and not restrictive.
The present application claims the benefit of and priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 62/777,646 entitled “Neural Lander: Stable Drone Landing Control Using Learned Dynamics” filed Dec. 10, 2018. The disclosure of U.S. Provisional Patent Application No. 62/777,646 is hereby incorporated by reference in its entirety for all purposes.
This invention was made with government support under Grant No. HR0011-18-9-0035 awarded by DARPA. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62777646 | Dec 2018 | US |