The presently disclosed subject matter relates generally to neural networks and physics based training. More particularly, the present subject matter described herein relates to methods, systems, and computer readable media for utilizing physics augmented neural networks configured for operating in environments that mix order and chaos.
Presently, artificial neural networks are popular tools in industry and academia, especially for classification and regression, and are beginning to elucidate nonlinear dynamics and fundamental physics. Recent neural networks outperform traditional techniques in symbolic integration and numerical integration and outperform humans in strategy games, such as chess and Go. But neural networks have a blind spot as they are unaware of the chaos and strange attractors of nonlinear dynamics, where exponentially separating trajectories bounded by finite energy repeatedly stretch and fold into complicated self-similar fractals. Attempts by neural networks to learn and predict nonlinear dynamics can be frustrated by ordered and chaotic orbits (e.g., irregular dynamic behavior) coexisting at the same energy for different initial positions and momenta.
Recent research features artificial neural networks that incorporate Hamiltonian structure to learn fundamental dynamical systems. But from stormy weather to swirling galaxies, natural dynamics is far richer and more challenging.
Accordingly, a need exists for improved methods, systems, and computer readable media for utilizing physics augmented neural networks configured for operating in environments that mix order and chaos.
Methods, systems, and computer readable media for utilizing physics augmented neural networks configured for operating in environments that mix order and chaos are disclosed. In one embodiment, the method includes utilizing a neural network (NN) pre-processor to convert generic coordinates associated with a dynamical system to canonical coordinates, concatenating a Hamiltonian neural network (HNN) to the NN pre-processor to create a generalized HNN, and training the generalized HNN to learn nonlinear dynamics present in the dynamical system from generic training data. The method also includes utilizing the trained generalized HNN to forecast the nonlinear dynamics, and quantifying chaotic behavior from the forecasted nonlinear dynamics to discover and map one or more transitions between orderly states and chaotic states exhibited by the dynamical system.
According to another aspect of the subject matter described herein, a method wherein the generalized HNN is utilized to execute applications including a self-driving automobile application, a drone piloting application, a tracking application, an aerospace application, a social network dynamic application, and a control system application.
According to another aspect of the subject matter described herein, a method wherein the dynamical system is a nonlinear system.
According to another aspect of the subject matter described herein, a method wherein the generalized HNN is further configured to detect when a macroscopic system is unable to be modeled using Hamiltonian dynamics
According to another aspect of the subject matter described herein, a method wherein the generalized HNN is trained using physics-informed machine learning.
According to another aspect of the subject matter described herein, a method wherein the generalized HNN is a feed-forward neural network that is configured to learn from the generic training data.
According to another aspect of the subject matter described herein, a method wherein a customized loss function is utilized to compel a Hamiltonian phase space flow.
According to another aspect of the subject matter described herein, a method wherein the generalized HNN utilizes a neural network autoencoder to capture dimensionality.
According to another aspect of the subject matter described herein, a method wherein the dynamic system is a Hénon-Heiles system.
According to another aspect of the subject matter described herein, a method wherein the chaotic behavior is quantified using a smaller alignment indices metric.
In another embodiment, a system for utilizing physics augmented neural networks configured for operating in environments that mix order and chaos includes at least one processor, a memory element, and a neural network pre-processor configured to convert generic coordinates associated with a dynamical system to canonical coordinates. The system further includes an augmented HNN engine stored in the memory element and when executed by the at least one processor is configured for concatenating a Hamiltonian neural network (HNN) to the NN pre-processor to create a generalized HNN, training the generalized HNN to learn nonlinear dynamics present in the dynamical system from generic training data, utilizing the trained generalized HNN to forecast the nonlinear dynamics, and quantifying chaotic behavior from the forecasted nonlinear dynamics to discover and map one or more transitions between orderly states and chaotic states exhibited by the dynamical system.
The subject matter described herein may be implemented in hardware, software, firmware, or any combination thereof. As such, the terms “function” “node”, “engine” or “module” as used herein refer to hardware, which may also include software and/or firmware components, for implementing the feature being described. In one exemplary implementation, the subject matter described herein may be implemented using a computer readable medium having stored thereon computer executable instructions that when executed by the processor of a computer control the computer to perform steps. Exemplary computer readable media suitable for implementing the subject matter described herein include non-transitory computer-readable media, such as disk memory devices, chip memory devices, programmable logic devices, and application specific integrated circuits. In addition, a computer readable medium that implements the subject matter described herein may be located on a single device or computing platform or may be distributed across multiple devices or computing platforms.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The subject matter described herein will now be explained with reference to the accompanying drawings of which:
The subject matter described herein relates to methods, systems, and computer readable media for utilizing physics augmented neural networks configured for operating in environments that mix order and chaos. Notably, the disclosed subject matter leverages the Hamiltonian structure of conservative systems to provide neural networks with the physics intelligence needed to learn the mix of order and chaos that often characterizes natural phenomena. In some embodiments, the disclosed subject matter comprises an Artificial Intelligence Hamilton (AIH) software engine that instantiates an Advanced Hamiltonian Neural Network (AHNN) method. After reviewing Hamiltonian chaos and neural networks, the disclosed subject matter can apply Hamiltonian neural networks to the Hénon-Heiles potential, a numerical and dynamical benchmark, which can model both stellar and molecular dynamics Even as these systems transition from order to chaos, Hamiltonian neural networks correctly learn their dynamics, overcoming deep learning's chaos blindness. If chaos is a nonlinear “super power,” enabling deterministic dynamics to be practically unpredictable, then the Hamiltonian is a neural network that enables learning and forecasting order and chaos.
The Hamiltonian formalism describes phenomena from astronomical scales to quantum scales. Even dissipative systems involving friction or viscosity are microscopically Hamiltonian. It reveals underlying structures in position-momentum phase space and reflects essential symmetries in physical systems. Its elegance stems from its geometric structure, where positions q and conjugate momenta p form a set of 2N canonical coordinates describing a physical system with N degrees of freedom. A single Hamiltonian function uniquely generates the time evolution of the system via the 2N coupled differential equations
{{dot over (q)},{dot over (p)}}={dq/dt,dp/dt}={+∂/∂p,−∂/∂q},
where the overdots are Newton's notation for time derivatives.
This classical formalism exhibits two contrasting dynamics: simple integrable motion suggests a “clockwork universe,” while complicated nonintegrable motion suggests a chaotic one. Additional conserved quantities constrain integrable orbits to smooth N-dimensional “kamtori” in 2N-dimensional phase space, as in
The Hénon-Heiles potential, which models phenomena ranging from the orbits of stars to the vibrations of molecules, provides an example of such an order-to-chaos transition. In a four-dimensional phase space {q, p}={qx, qy, px, py}, its nondimensionalized Hamiltonian is represented as:
=(px2+py2)/2+(qx2+qy2)/2+(qx2qy−qy3/3)
which is the sum of the kinetic and potential energies, including quadratic harmonic terms perturbed by cubic nonlinearities that convert a circularly symmetric potential into a triangularly symmetric potential. Bounded motion is possible in a triangular region of the {x, y} plane for energies 0<E<⅙. As orbital energy increases, circular symmetry degenerates to triangular symmetry, integrable motion degrades to nonintegrable motion, kamtori become cantori, and ordered islands give way to a chaotic sea.
While traditional analyses focus on forecasting orbits or understanding fractal structure, understanding the entire landscape of dynamical order and chaos requires new tools. Artificial neural networks are today widely used and studied partly because they can approximate any continuous function. Recent efforts to apply artificial neural networks to chaotic dynamics involve the recurrent neural networks of reservoir computing. Instead, the dominant feed-forward neural networks of deep learning can be exploited.
Inspired by natural neural networks, the activity al=σ[−1+] of each layer of a conventional feed-forward neural network is the nonlinear step or ramp of the linearly transformed activities of the previous layer, where σ is a vectorized nonlinear function that mimics the on-off activity of a natural neuron, are activation vectors, and and are adjustable weight matrices and bias vectors that mimic the dendrite and axon connectivity of natural neurons. Concatenating multiple layers eliminates the hidden neuron activities, so the output y=ƒP[x] is a parametrized nonlinear function of just the input x and the weights and biases P={}. A training session inputs multiple x and adjusts the weights and biases to minimize the difference or “loss” =(yt−y)2 between the target yt and the output y so the neural network learns the correspondence.
In
NN=({dot over (q)}t−{dot over (q)})2+({umlaut over (q)}t−{umlaut over (q)})2
until it learns the correct mapping. In contrast, Hamiltonian neural network ‘HNN’ 204 intakes position and momenta {q, p}, outputs the scaler function , takes one gradient to find its position and momentum rates of change, and minimizes the loss
HNN=({dot over (q)}t−∂/∂p)2+({dot over (p)}t+∂/∂q)2
which enforces Hamilton's equations of motion. For a given time step dt, each trained network can extrapolate a given initial condition with an Euler update {q, p}←{q, p}+{{dot over (q)}, {dot over (p)}}dt or some better integration scheme.
Loosely, a NN 202 learns the orbits, while HNN 204 learns the Hamiltonian. Geometrically, NN 202 learns the generalized velocities, the dual mappings {q, {dot over (q)}}→{dot over (q)} and {q, {dot over (q)}}→{umlaut over (q)}, while HNN 204 learns the Hamiltonian generator function, the single mapping {q, p}→, whose (symplectic) gradient gives the generalized velocities {{dot over (q)}, {dot over (p)}}. With the same resources, HNN 204 outperforms NN 202, and the advantage grows as the phase space dimension increases, where q and p are multicomponent vectors.
In some embodiments of the disclosed subject matter, neural networks may be “stress tested” on the Hénon-Heiles system, as its mixed phase space of order and chaos is an especially challenging dynamical scenario to identify and decipher. For selected bounded energies and for the same learning parameters detailed in table 300 shown in
To quantify the ability of NN and HNN to paint a full portrait of the global, mixed phase space dynamics, the NN's and HNN's knowledge of the system is used to estimate the Hénon-Heiles Lyapunov spectrum, which characterizes the separation rate of infinitesimally close trajectories, one exponent for each dimension. Since perturbations along the flow do not cause divergence away from it, at least one exponent will be zero. For a Hamiltonian system, the exponents exist in diverging-converging pairs to conserve phase space volume. Hence, a spectrum like {−λ, 0, 0, +λ} is expected, as in
Using NN and HNN, the smaller alignment index α can be computed. This index is a metric of chaos that allows one to quickly find the fraction of orbits that are chaotic at any energy. Further, α can be computed for a specific orbit by following the time evolution of two different normalized deviation vectors along the orbit and computing the minimum of the norms of their difference and sum. Via extensive testing, an orbit is chaotic if α<10−8, indicating that its deviation vectors have been aligned or anti-aligned by a large positive Lyapunov exponent.
To understand what NN and HNN have learned when these neural networks forecast orbits, an autoencoder (e.g., a neural network with a sparse “bottleneck” layer) is used to examine their hidden neurons. The autoencoder's mean-square-error loss function forces the input to match the output, so its weights and biases adjust to create a compressed, low-dimensional representation of the neural networks' activity, a process called introspection. For HNN, the loss function Lb drops precipitously for Nb=4 (or more) bottleneck neurons, which appear to encode a combination of the four phase space coordinates, thereby capturing the dimensionality of the system, as shown in
Billiards can model a wide range of real-world systems, spanning lasers, optical fibers, ray optics, acoustic and microwave cavities, quantum dots, and nanodevices. Billiards also elucidates the subtleties of quantum-classical correspondence and the challenging notion of quantum chaos.
Dynamicists typically idealize billiard tables with hard boundaries and discontinuous potentials. With similar phenomenology, billiard tables can be modeled with soft boundaries and continuous potentials, so the billiard balls' momenta change rapidly but continuously at each bounce. The Hamiltonian disclosed herein can be represented as:
=(px2+py2)/2+
with potential energy
where ro is the radius of the outer circle, ri is the radius of the inner circle, δqx is the shift of the inner circle, and δr is the softness of the walls, as shown in plot 1000 of
The billiard tables are bounded by two circles, and the dynamics exhibits fascinating transitions as the circles resize or shift. Ordered and chaotic trajectories coexist for different initial conditions at the same energy. Plot 1100 of
The disclosed subject matter has demonstrated the efficient forecasting of Hamiltonian neural networks in diverse systems, both perturbative, like Hénon-Heiles with =0+∈1, and nonperturbative, like dynamical billiards. Other successful implementations include higher-dimensional Hénon-Heiles, and simple harmonic oscillators and pendulums with noisy training data. The disclosed subject matter has successfully used Hamiltonian neural networks to learn the dynamics of librating (back-and-forth) and rotating (end-over-end) single, double, and triple pendulums. The angles of rotating pendulums diverge, which makes them difficult for neural networks to learn. Consequently, the disclosed subject matter compactifies the pendulum phase space by wrapping it into a cylinder. It has been determined that both NN and HNN can learn the pendulums, and that the learning improves with the number of training pairs, but HNN is significantly better, as shown in plot 1200 in
In some embodiments, the baseline neural network NN and Hamiltonian neural network HNN (which will also be referenced herein as an Advanced Hamiltonian Neural Network (AHNN)) can be implemented in the Python programming language using the PyTorch open source machine learning library. Table 300 in
In some embodiments, the disclosed subject matter is further configured to scale physics informed machine learning with data interventions. Notably, the relative error scales quantitatively with dimensionality and number of training data. For example, the AIH engine can quantify the manner in which incorporating physics into a neural network design can significantly improve the learning and forecasting of dynamical systems, even nonlinear systems of many dimensions. Conventional neural networks and Hamiltonian neural networks can be trained on increasingly difficult dynamical systems. Further, the forecasting errors of these neural networks can be computed as the number of training data and number of system dimensions vary.
As indicated above, artificial neural networks are powerful tools being developed and deployed for a wide range of uses, especially for classification and regression problems. The artificial neural networks can approximate continuous functions, model dynamical systems, elucidate fundamental physics, and master strategy games. Notably, the scope of artificial neural networks can be extended by exploiting the symplectic structure of Hamiltonian phase space to forecast the dynamics of conservative systems that mix order and chaos.
Although recurrent neural networks have been used to forecast dynamics, the disclosed subject matter pertains to the ubiquitous feed-forward neural networks, especially as these neural networks learn dynamical systems of increasingly high dimensions. The ability of neural networks to learn high-dimensional dynamical systems accurately and efficiently is an important challenge for deep learning, as real-world systems are necessarily multi-component and thus typically high-dimensional. Conventional neural networks often perform significantly worse when encountering high-dimensional systems. As such, this can render conventional NNs to limited use in complex real-world scenarios that comprise many degrees of freedom. Thus, it is crucial to find methods that scale and continue to efficiently and accurately learn and forecast dynamics under increasing dimensionality.
As described herein, the AIH engine can be configured to implement, instantiate, executes, train, or manage the AHNN method (e.g., the AHNN), a conventional NN, and/or a HNN. For example, the AIH engine can conduct the systematic training of conventional and Hamiltonian neural networks on increasingly difficult dynamical systems, including linear and nonlinear oscillators and a coupled bistable chain. The AIH engine can also be configured to compute the forecasting errors of NNs or HNNs as the number of training data vary and the number of system dimensions vary. The disclosed can be configured to provide an alternate map-building perspective to understand and elucidate how HNNs learn differently from conventional NNs. In particular, the significant advantages offered by HNNs in learning and forecasting high-dimensional systems is demonstrated. The pivotal concept is that HNNs learn the single energy surface, while NNs learn the tangent space (where the derivatives are), which is more difficult for the same training parameters. As the number of derivatives increase with the dimension, so too do the advantages afforded by the HNN.
The AIH engine further assesses whether the advantage of neural-network inductive biases grows or shrinks with problem complexity. It achieves this goal by systematically varying the dimension of the system and the amount of training data while measuring the performance of a Hamiltonian neural network relative to a baseline neural network.
In some embodiments, a neural network is a nonlinear function that can be represented as:
o=F[i,w]=Fw[i]
that converts an input i to an output o according to its (typically very many) weights and biases w. Training a neural network with input-output pairs repeatedly updates the weights and biases by:
to minimize a cost function C, where η is the learning rate, with the expectation that the weights and biases approach their optimal values w→ŵ.
A conventional NN learning a dynamical system may be configured to intake a position and velocity (hand eh, output a velocity and acceleration {dot over (q)}o and {umlaut over (q)}o, and adjust the weights and biases to minimize the squared difference:
C=({dot over (q)}i−{dot over (q)}o)2+({umlaut over (q)}i={umlaut over (q)}o)2|
and ensure proper dynamics, as in the neural network 1502 of
To overcome limitations of conventional neural networks, especially when forecasting dynamical systems, recent neural network algorithms have incorporated ideas from physics. In particular, incorporating the symplectic phase space structure of Hamiltonian dynamics has proven very valuable. A Hamiltonian neural network (HNN) learning a dynamical system intakes position and momentum qi and pi but outputs a single energy-like variable H, which it differentiates according to Hamilton's recipe.
as shown in HNN 1504 of
C=({dot over (q)}i−{dot over (q)}o)2+({dot over (p)}i−{dot over (p)}o)2
then assures symplectic dynamics, including energy conservation and motion on phase space tori. So rather than learning the derivatives, HNN 1504 learns the Hamiltonian function which is the generator of trajectories. Since the same Hamiltonian function generates both ordered and chaotic orbits (e.g., ordered and chaotic behavior), learning the Hamiltonian allows the network to forecast orbits outside the training set. In fact, HNN 1504 has the capability of forecasting chaos even when trained exclusively on ordered orbit data.
The HNN can also be configured to utilize a linear oscillator. For a simple harmonic oscillator with mass m=1, stiffness k=1, position q, and momentum p, the Hamiltonian can be represented as
so Hamilton's equations
imply the linear equation of motion
{umlaut over (q)}=−q.
The HNN can map its input to the paraboloid (i.e., ‘HNN mapping equation’)
but the NN maps its input to two intersecting planes (i.e., ‘NN mapping equation’))
F
ŵ[{q.{dot over (q)}}]=∂t{q.{dot over (q)}}={{dot over (q)}.−q}={F1.F2},
as illustrated by the cyan surfaces in
The AIH engine implements the neural networks in Mathematica using symbolic differentiation. The HNN and NN train using the same parameters as shown in table 1702 in
In plot 1802 of
More generally, in d spatial dimensions and 2d phase space, the quadratic oscillator Hamiltonian
has a linear restoring force, but the d-dimensional quartic oscillator
has a nonlinear restoring force.
In some embodiments, the AIH engine implements the neural networks in Python using automatic differentiation. The HNN and NN can train with the same parameters, some of which are optimized as in table 1704 shown in
The forecasting error analysis is repeated for dimensions 1≤d≤9. Notably, the HNN maintains its forecasting edge over the NN in high dimensions, as summarized by the smoothed contour plot 2000 in
Likewise, the heights and rainbow hues depicted in
Next, the forecasting error analysis is repeated for nonlinear oscillators, as shown in
Consider a chain of coupled bistable oscillators as shown in schematic 2300 of
The AHNN can model each blade by the nonlinear spring force
ƒ[q]=aq−bq3
with a, b>0. The corresponding potential
V[q]=−∫ƒ[q]dq=−½aq2+¼bq4
has an unstable equilibrium at q=0 and stable equilibria at q=±√{square root over (a/b)}. The AHNN can be configured to couple adjacent masses by linear springs of stiffness κ. For d identical masses m=1, the Hamiltonian
Hamilton's equations imply
Enforce free boundary conditions by demanding
q
0
=q
1.
q
d+1
=q
d.
As with the uncoupled high-dimensional systems, for sufficiently many training pairs, the HNN significantly outperforms the NN in forecasting the bistable chain, as shown in
Conventional neural networks are universal function approximators, but these neural networks may impractically require significant amounts of training data to approximate nonlinear dynamics The disclosed Hamiltonian neural networks can efficiently learn and forecast dynamical systems that conserve energy, but these HNNs require special inputs called canonical coordinates, which may be hard to infer from data. In some embodiments, the AIH engine can be configured by prepending a conventional neural network to a Hamiltonian neural network and show that the combination (e.g., a “generalized Hamiltonian neural network (gHNN)” accurately forecasts Hamiltonian dynamics from generalized noncanonical coordinates. In some embodiments, the gHNN is a sub-method and/or subcomponent of the AHNN described herein. Examples may include a predator-prey competition model where the canonical coordinates are nonlinear functions of the predator and prey populations, an elastic pendulum characterized by nontrivial coupling of radial and angular motion, a double pendulum each of whose canonical momenta are intricate nonlinear combinations of angular positions and velocities, and real-world video of a compound pendulum clock.
Specifically, Hamiltonian neural networks typically train on canonical variables (e.g., positions and their conjugate momenta) that might not be known or accessible experimentally. In order to overcome the limitations and complexity of prior approaches, the disclosed subject matter can be configured to demonstrate a general extension of HNN, which utilizes a neural network preprocessor to i) train on a set of readily observable generalized coordinates, ii) learn the underlying Hamiltonian, and then iii) accurately forecast the dynamics, even if the training data is contaminated by noise. Example systems include the Lotka-Volterra predator-prey model, which unexpectedly can be converted into a Hamiltonian system by a nonlinear variable change, and an elastic pendulum, whose conjugate momenta are nonlinear combinations of its generalized coordinates. Other examples includes an even more generic and complicated nonlinear double pendulum and a wooden pendulum clock recorded with a hand-held video camera.
In conventional feed-forward artificial neural networks, the activity of neurons in one layer
a
l
σ[ωlal−1+bl]
is a vectorized sigmoid function of a linear combination of the activities in the previous layer, where table 2500 in
y=f[x,ω]=yω[x],
where the weights and biases ω={ωl, b1}. Given many training pairs τ={xn, yn} and a “loss” function like the mean-square-error
L
ω
=∥y−f[x,ω]∥2,
an optimization algorithm like stochastic gradient descent finds the best weights and biases
and the trained neural network
y=f[x,{circumflex over (ω)}τ]
approximates the desired function y[x].
To apply a neural network to a dynamical system
v=v[r,ω]=vω[r],
a NN 2640 intakes positions and velocities r={q, {dot over (q)}} (e.g., inputs 2602-2604) and output velocities and accelerations {dot over (r)}={{dot over (q)}, {umlaut over (q)}} (e.g., output 2606-2608) as in
L
ω
=∥{dot over (r)}−v
ω[r]∥2
and training pairs {r, {dot over (r)}}→{q, {dot over (q)}, {umlaut over (q)}}, optimise to find the best
and use the trained neural network
{dot over (r)}=v
{circumflex over (ω)}[r]
to evolve the system forward or backward in time.
To create a Hamiltonian neural network 2650
=[R,{dot over (ω)}]=ω[R],
the AIH engine is configured to intake phase space or canonical coordinates (e.g., inputs 2610-2610)
where is the Lagrangian, and output a scalar Hamiltonian w 2650 as in
where S is the symplectic block matrix
The AIH engine may calculate the gradient a ∂/∂R using automatic differentiation of the neural network output (e.g., output 2614) with respect to its input, and define the mean-square-error loss function
The AIH engine optimizes over training pairs {R, {dot over (R)}}→{Q, P, {dot over (Q)}, {dot over (P)}} to find the best ŵ, and use the trained neural network
Generalised Hamiltonian Neural Network (gHNN)
In some embodiments, the disclosed subject matter may be configured to learn a dynamical system's phase space vector field (or differential equations) from the experimentally observed generalized coordinates of sample orbits. However, for most problems, the generalized coordinates are not canonical coordinates. Therefore, to leverage the power of HNN, the AIH engine can be configured to embody a modified learning architecture where canonical coordinates are effectively learned in an unsupervised manner. To create a generalized HNN 2660
=[R[r],ω]=ω[R[r]],
a neural network concatenation intakes generalized positions and velocities r={q, {dot over (q)}} (e.g., inputs 2618-2620) transforms them to position and conjugate momenta R={Q, P} (e.g., data 2622-2624), or some combinations thereof, and outputs a scalar Hamiltonian w (e.g., output 2626) as shown in
where J is a Jacobian matrix of partial derivatives. The AIH engine can be configured to invert to find
using Hamilton's equations indicated above. The AIH engine may then calculate the derivatives ∂w/∂R and ∂R/∂r using automatic differentiation of the neural network outputs with respect to their inputs, and define the mean-square-error loss function
The AIH engine can then optimize over training pairs {r, {dot over (r)}}Θ{q, {dot over (q)}, {umlaut over (q)}} to find the best ŵ, and use the trained neural network
to evolve the system.
In the special case where the generalized coordinates are the canonical positions, q=Q, the Jacobian simplifies to the block matrix
If observed or generalized coordinates ={u, v} relate to an unknown or implicit Hamiltonian with canonical coordinates Q and P, then the neural network architecture
=[R[],ω]=ω[R[]]
intakes the observables u and v, transforms them to the unknown position and conjugate momenta Q and P, and outputs a scalar Hamiltonian w. In this case, the disclosed gHNN of the AHNN assumes a loss function
and optimizes over training pairs {, w}={u, v, {dot over (u)}, {dot over (v)}} to find the best ŵ.
The Lotka-Volterra predator-prey model is the “hydrogen atom” of mathematical ecology. It is also an interesting and highly nontrivial example of a system that has an underlying Hamiltonian structure, though it has no mechanical analogue, and its standard variables do not allow a Hamiltonian or Lagrangian description of its time evolution. Further, since this system arises in the context of population dynamics, there is no intuitive equivalent of kinetic or potential energy. So the construction of the Hamiltonian function via the usual route of kinetic and potential energy components is not possible here, and consequently it is highly nontrivial to guess the form of the Hamiltonian for this system.
Specifically, the coupled nonlinear differential equations governing the population of prey ni and predator n2 are
{dot over (n)}
1
=+αn
1
−βn
1
n
2,
{dot over (n)}
2
=−γn
2
+δn
1
n
2,
Notice that neither variable (nor their combinations) can be naturally or readily identified as being coordinate-like or momentum-like. Also, interestingly the combination
=α log n2−βn2+γ log n1−δn1
is a constant of motion but not a Hamiltonian that generates dynamics associated with the coupled nonlinear differential equations presented above. However, the exponential transformation
n
1
=e
Q,
n
2
=e
P
implies the coupled system
{dot over (Q)}=+α−βe
P,
{dot over (P)}=−γ+δe
Q,
where the combination
=αP−βeP+γQ−δeQ
is both a constant of the motion and a Hamiltonian that generates coupled system dynamics via
Thus, a nonlinear change of variables converts the system into a Hamiltonian form, and helps reveal the underlying Poisson structure that is not evident at all in standard variables.
Now the learning task is to predict the conservative dynamics by training on the “ordinary” coordinates {n1, n2} and their derivatives {n1, n2} which are the natural observables in the system, without knowing the “canonical” coordinates {Q, P}.
The training data may include 100 trajectories corresponding to different initial conditions, each with a different pseudo-energy, which demonstrates the famous cycling of predator and prey populations, where the state {n1, n2}={γ/δ, α/β} is an elliptical fixed point, and the state {n1, n2}={0, 0} is a hyperbolic fixed point. The sampling time Δt=0.1 is intentionally large to better approximate real-world data. Implementation details are indicated above. In some embodiments, parameters are α=β=δ=γ=1.
Each neural network can be trained identically on the “ordinary” coordinates {n1, n2} and their derivatives using the loss functions indicated above. Forecasts are made from unseen initial conditions, as shown in
The elastic pendulum is a simple mechanical system that exhibits a range of fascinating behavior. In fact many real-world pendulums can be better modeled by an elastic, rather than an inextensible, suspension. The inverted version of such elastic pendulums also have relevance in robotics and mechatronics. Further, in formal terms, it serves as a paradigm of a simple nonlinear system whose canonical momenta are nontrivial combinations of its coordinates. If the pendulum has length =r and is at an angle θ from downwards, then the pendulum mass m is at position
r={x,y}={sin θ,−cos θ}
moving with velocity
v={dot over (r)}={sin θ,−cos θ}+{cos θ, sin θ}{dot over (θ)}.
The Lagrangian
where m is the mass, k is the stiffness, 0 is the equilibrium length, and g is the gravitational field. The conjugate momenta
where p0 is not simply mass times velocity.
The learning task is to predict the conservative dynamics by training on the generalized coordinates {, θ} and their derivatives {, {dot over (θ)}, , {umlaut over (θ)}} without knowing the canonical coordinates {, θ, pl, pθ}. Parameters are m=g==1 and k=4. The training data consists of 100 trajectories corresponding to different initial conditions, each with a different energy, again coarsely sampled.
Each neural network trains identically on generalized coordinates {, θ} and their derivatives using the loss functions presented above. Forecasts are made from unseen initial conditions, as in
As a more challenging example, consider librations of a double pendulum. This is a classic chaos demonstrator, both of whose canonical momenta are nontrivial combinations of its coordinates. If the pendulum lengths 1 and 2 are at angles θ1 and θ2 from downwards, then the masses m1 and m2 are at positions
r
1
={x
1
,y
1}=1{sin θ1,−cos θ1},
r
2
={x
2
,y
2}=2{sin θ2,−cos θ2}+r1
moving with linear velocities
v
1
={dot over (r)}
1=1{cos θ1, sin θ1}{dot over (θ)}1,
v
2
={dot over (r)}
2=2{cos θ2, sin θ2}{dot over (θ)}2+v1.
The Lagrangian
where g is the gravitational field. The conjugate momenta
where neither p1 nor p2 is simply mass times velocity.
The learning task is to predict the conservative dynamics by training on the generalized coordinates {θ1 and θ2} and their derivatives {dot over (θ)}1, {dot over (θ)}2, {umlaut over (θ)}1, {umlaut over (θ)}2} without knowing the canonical coordinates {θ1, θ2, p1, p2}. Parameters are m1=m2=1=2=g=1. The training data consists of 100 trajectories corresponding to different initial conditions.
Each neural network trains identically on generalized coordinates {θ1, θ2} and their derivatives using the loss functions presented above. Forecasts are made from unseen initial conditions, as in
As a final real-world example, consider a wooden pendulum clock. The falling weight drives the pendulum and a deadbeat escapement regulates its libration, overcoming dissipation so the motion is approximately (but not identically) of constant amplitude and frequency. A hand-held smartphone can be used to record 100 seconds of motion at 30 frames-per-second. Video tracking of the ends of the pendulum records the pendulum librations. Trigonometry extracts the angles from the coordinate differences, and finite differencing with a Savgol filter estimates the angular velocities and accelerations. It is determined that the NN fails to learn the motion and its forecast collapses, while a similarly small gHNN can quickly learn a good approximation to the motion.
In some embodiments, the AHNN can be further configured to utilize compactification for purposes of handling unbounded coordinates. Namely, physics-informed machine learning has recently been shown to efficiently learn complex trajectories of nonlinear dynamical systems, even when order and chaos coexist. However, care must be taken when one or more variables are unbounded, such as in rotations. Here, the AHNN uses the framework of HNNs to learn the complex dynamics of nonlinear single and double pendulums, which can both librate and rotate, by mapping the unbounded phase space onto a compact cylinder. Moreover, the AHNN can successfully forecast the motion of these challenging systems, thus being capable of both bounded and unbounded motion. It is also evident that the HNN can yield an energy surface that closely matches the surface generated by the true Hamiltonian function. Further, the relative energy error for HNN has been observed to decrease as a power law with a number of training pairs. Accordingly, HNNs are clearly outperforming conventional neural networks quantitatively.
As indicated above, artificial neural networks (ANN) are powerful tools being developed and deployed in science and industry for a wide range of uses (e.g., especially for classification and regression problems) in applications ranging from pattern recognition to game playing. Although ANNs incorporate nonlinearity in their activation functions, ANNs falter when confronting nonlinear dynamics, which can typically yield qualitatively different behavior, such as vibrations and rotations, or order and chaos. Another striking drawback of conventional neural networks extrapolating time series in Hamiltonian systems is that they may not conserve energy, and the predicted orbits often wander off the energy surface or shoot away to infinity
In this context, the AHNN can be configured to leverage the symplectic structure of Hamiltonian phase space. The aforementioned novel physics-inspired framework of HNN internalizes the gradient of an energy-like function in a network's weights and biases. So HNNs embed Hamiltonian dynamics in its operation and ensure that the neural network respects Hamiltonian time-translational symmetry. Importantly, the HNN algorithm incorporates broad principles of energy conserving and volume preserving flows arising from an underlying Hamiltonian function, without invoking any details of its explicit form. It has been demonstrated that HNN can recognize the presence of order and chaos as well as challenging regimes where both these very distinct dynamics coexist. The success of HNN to discern chaos has been explicitly quantified by metrics like Lyapunov spectra and smaller alignment indices, in benchmark dynamical systems such as the paradigmatic Hénon-Heiles potential, and in chaotic billiards. Notably, the physics-informed HNN algorithm significantly enhances the scope of conventional neural networks by successfully forecasting the dynamics of conservative systems, spanning regular ordered behavior to complex chaotic dynamics
Further, the improvement in learning and forecasting of dynamical systems was quantified by training conventional and Hamiltonian neural networks on increasingly difficult dynamical systems, and computing their forecasting errors as the number of training data and number of system dimensions varied. The disclosed subject matter utilizes the improved scaling with data and dimensions achieved through incorporation of physics into neural network design. Since nonlinear dynamics is ubiquitous, this neural network “superpower” is widely and immediately applicable.
In some embodiments, the disclosed subject matter utilizes the HNN to model the nonlinear dynamics of pendulums and double pendulums. This is a challenging test-bed as the pendulum can have two very distinct motions. A pendulum can librate, i.e., move back-and-forth, with the motion having turning points. A pendulum may also rotate end-over-end, a case where the angles are unbounded quantities as there are no turning-points, and this can frustrate standard forecasting techniques. For the single pendulum, these two qualitatively different motions are separated by a special curve in phase space, the separatrix, which serves as a boundary between libration (also known as vibration) and rotation. The AIH engine addresses this via mapping the pendulum motion onto a cylindrical phase space and use the coordinates on the cylinder to learn and forecast the motion on both sides of the phase space separatrix.
In some embodiments, a trained neural network is a concatenation of layers of nodes called “neurons” that instantiates a nonlinear function represented as:
o=N[i,Ŵ,{circumflex over (b)}]=N{dot over (W)}{dot over (b)}[i],
where Ŵ and {circumflex over (b)} are the optimal parameters (called weights W and biases b) to convert a given input i to a desired output o. When forecasting a dynamical system, a HNN intakes positions and momenta {q, p} and outputs the Hamiltonian
=N{umlaut over (W)}{umlaut over (b)}[{q,p}],
as shown in
∂i{q,{dot over (q)}}=N{dot over (W)}{dot over (b)}[{q,{dot over (q)}}],
where overdots indicate time differentiation.
In some embodiments, the HNN algorithm is configured to output the scalar Hamiltonian function , take its gradient to find its position and momentum rates of change, and minimizes the loss presented below:
HNN=({dot over (q)}t−∂/∂p)2+({dot over (p)}t+∂/∂q)2
This loss function enforces the basic structure of Hamilton's equations of motion, for any Hamiltonian function.
Accordingly, the fundamental distinction between the NN and the HNN includes the following: the NN learns the orbits (and/or dynamic behaviors), while the HNN learns the Hamiltonian. Geometrically, the NN learns the generalized velocities, the dual mappings {q, {dot over (q)}}→{dot over (q)} and {q, {dot over (q)}}→{umlaut over (q)}, while the HNN learns the Hamiltonian generator function, the single mapping {q, p}→, whose (symplectic) gradient gives the generalized velocities {{dot over (q)}, {dot over (p)}}. With the same resources, it has been convincingly demonstrated that the HNN outperforms the NN, and the advantage grows as the phase space dimension increases, where q and p are multi-component vectors.
In some embodiments, the Hamiltonian of a pendulum with unit length and unit mass, (angular) position q=0 and (angular) momentum p is
where p=L=Iω=I{dot over (θ)} and where I denotes the moment of inertia. For a simple pendulum, I is given by the product of the mass and square of the length. So I=1 in this case, as the unit length and unit mass are known. For example, Hamilton's equations of motion
imply Newton's equation of motion
{umlaut over (q)}=−sin q.
For the nonlinear pendulum, the HNN maps its input to the surface
but the conventional NN maps its input to a plane and an intersecting sinusoid
N
{umlaut over (W)}{umlaut over (b)}[{q,{dot over (q)}}]=∂t{q,{dot over (q)}}={{dot over (q)},−sin q}.
Unlike the simpler harmonic oscillator, the nonlinear pendulum exhibits two qualitatively different kinds of motion: back-and-forth libration for small energies and over-the-top rotation for large energies, which challenge conventional neural networks. For rotations, (angular) position q increases without bounds and cannot be scaled to a finite range. Using the representation of the (angular) position modulo 2π introduces discontinuities that violate the neural network universal approximation theorems.
In some embodiments, the AHNN tackles this problematic issue by wrapping the phase space onto a cylinder, as depicted in
Inversely
Thus the chain rule
Hence input {q, p}→{x, y, p} and output
where this choice of equation avoids numerical instability when x or y are near zero.
In some embodiments, the AIH engine implements the neural networks in Python using automatic differentiation. The HNN and the NN can be trained using the same hyperparameters, as summarized in table 3100 in
Further,
In some embodiments, the AHNN can be modeled as a nonlinear double pendulum. For a double pendulum 3400 that is exhibiting periodic and chaotic motion as shown in
x
1=+sin q1,
y
1=−cos q1,
and
x
2
=x
1+sin q2,
y
2
=y
1−cos q2
yield the Lagrangian, derivatives of which generate the momenta
p
1=2{dot over (q)}1+{dot over (q)}2 cos[q1−q2],
p
2
={dot over (q)}
2
+{dot over (q)}
1 cos[q1−q2].
A Legendre transformation of the Lagrangian generates the Hamiltonian:
and Hamilton's equations of motion yield the following:
In the two-dimensional phase space of the single pendulum, a one-dimensional curve separates the bound and unbound orbits (e.g., dynamic behaviors). While this is topologically impossible in the four-dimensional phase space of the double pendulum, the dynamics still exhibit the qualitatively different motions of libration and rotation of the individual pendulum masses. The boundaries demarcating distinct dynamical behaviors in the high-dimensional phase space of the double pendulum are very complex (see
As before, the AIH engine can implement the neural networks in Python using automatic differentiation.
Physics-informed machine learning has been shown to efficiently learn complex trajectories of nonlinear dynamical systems. However, one encounters problems when one or more variables are unbounded, such as in rotations. Notably, the AHNN can be configured to use the framework of HNNs to learn the complex dynamics of nonlinear single and double pendulums that can both librate back-and-forth and rotate end-over-end. The unbounded motion may be handled by mapping onto a cylindrical phase space and working with the compact cylinder coordinates. The AHNN demonstrates that this approach is able to successfully learn and forecast the qualitatively distinct behavior on both sides of the phase space separatrix. It is also evident that the HNN can yield an energy surface which is a close match to the surface generated by the true Hamiltonian function. Lastly, it is observed that the relative energy error for HNN decreases as a power law with number of training pairs, with HNN clearly outperforming conventional neural networks quantitatively.
It will be appreciated that
In block 3802, a NN pre-processor is utilized to convert generic coordinates associated with a dynamical system to canonical coordinates. In some embodiments, an exemplary NN pre-processor that is configured to obviate the need for special canonical coordinates is described above and shown in
In block 3804, a Hamiltonian neural network (HNN) is concatenated to the NN pre-processor to create a generalized HNN (and/or AHNN). In some embodiments, an exemplary generalized HNN that is formed by the combining of the NN pre-processor and an HNN. A resulting generalized HNN is described above and depicted in
In block 3806, the generalized HNN is trained to learn nonlinear dynamics present in the dynamical system from generic training data. Training can be conducted via conventional means and/or by the AIH engine depicted in
In block 3808, the trained generalized HNN is utilized to forecast the nonlinear dynamics. In some embodiments, the utilization of the trained generalized HNN is managed by an operator utilizing the host computer platform described above and shown in
In block 3810, chaotic behavior (e.g., chaotic orbits) is quantified from the forecasted nonlinear dynamics to discover and map one or more transitions between orderly states and chaotic states exhibited by the dynamical system. In some embodiments, the trained gHNN can be utilized by a user to quantify chaotic orbits and/or other behavior by receiving new input data.
It will be understood that various details of the presently disclosed subject matter may be changed without departing from the scope of the presently disclosed subject matter. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation.
This application claims the priority benefit of U.S. Provisional Patent Application Ser. No. 63/086,549, filed Oct. 1, 2020, the disclosure of which is incorporated herein by reference in its entirety.
This invention was made with government support under grant number N00014-16-1-3056 awarded by the U.S. Office of Naval Research. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63086549 | Oct 2020 | US |